Deep Neural Networks - Notes for lecture 4a

For the course by Geoffrey Hinton on Coursera

Learning to predict the next word
deep learning
neural networks
notes
coursera
Author

Oren Bochman

Published

Friday, August 11, 2017

Lecture 4a: Learning to predict the next word

A simple example of relational information

relational information

relational information

Another way to express the same information

  • Make a set of propositions using the 12 relationships:
    • son, daughter, nephew, niece, father, mother, uncle, aunt
    • brother, sister, husband, wife
  • (Colin has-father James)
  • (Colin has-mother Victoria)
  • (James has-wife Victoria) this follows from the two above
  • (Charlotte has-brother Colin)
  • (Victoria has-brother Arthur)
  • (Charlotte has-uncle Arthur) this follows from the above

A relational learning task

  • Given a large set of triples that come from some family trees, figure out the regularities.
    • The obvious way to express the regularities is as symbolic rules (x has-mother y) & (y has-husband z) => (x has-father z)
  • Finding the symbolic rules involves a difficult search through a very large discrete space of possibilities.
  • Can a neural network capture the same knowledge by searching through a continuous space of weights?

The structure of the neural net

structure of the neural net

structure of the neural net

visulization of 6 neuron weights

visulization of 6 neuron weights

the relational data

the relational data

What the network learns ?

  • The six hidden units in the bottleneck connected to the input representation of person 1 learn to represent features of people that are useful for predicting the answer.
    • Nationality, generation, branch of the family tree.
  • These features are only useful if the other bottlenecks use similar representations and the central layer learns how features predict other features. For example:
    • Input person is of generation 3 and
    • relationship requires answer to be one generation up
    • implies
    • Output person is of generation 2

Another way to see that it works

  • Train the network on all but 4 of the triples that can be made using the 12 relationships
    • It needs to sweep through the training set many times adjusting the weights slightly each time.
  • Then test it on the 4 held-out cases.
    • It gets about 3/4 correct.
    • This is good for a 24-way choice.
    • On much bigger datasets we can train on a much smaller fraction of the data.

A large-scale example

  • Suppose we have a database of millions of relational facts of the form (A R B).
    • We could train a net to discover feature vector representations of the terms that allow the third term to be predicted from the first two.
    • Then we could use the trained net to find very unlikely triples. These are good candidates for errors in the database.
  • Instead of predicting the third term, we could use all three terms as input and predict the probability that the fact is correct.
    • To train such a net we need a good source of false facts.

A relational learning task

  • Given a large set of triples that come from some family trees, figure out the regularities.
    • The obvious way to express the regularities is as symbolic rules:
      HasMother(x,y)\ and\ HasHusband(y,z) \implies HasFather(x, z)
  • Finding the symbolic rules involves a difficult search through a very large discrete space of possibilities.
  • Can a neural network capture the same knowledge by searching through a continuous space of weights?

The structure of the neural net

  • The six hidden units in the bottleneck connected to the input representation of person 1 learn to represent features of people that are useful for predicting the answer.
  • Nationality, generation, branch of the family tree. These features are only useful if the other bottlenecks use similar representations and the central layer learns how features predict other features. For example: Input person is of generation 3 and relationship requires answer to be one generation up implies Output person is of generation 2 This video introduces distributed representations. It’s not actually about predicting words, but it’s building up to that. It does a great job of looking inside the brain of a neural network. That’s important, but not always easy to do.

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2017,
  author = {Bochman, Oren},
  title = {Deep {Neural} {Networks} - {Notes} for Lecture 4a},
  date = {2017-08-11},
  url = {https://orenbochman.github.io/notes/dnn/dnn-04/l_04a.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2017. “Deep Neural Networks - Notes for Lecture 4a.” August 11, 2017. https://orenbochman.github.io/notes/dnn/dnn-04/l_04a.html.