Deep Neural Networks - Notes for Lesson 7b

Recurrent neural networks

Training RNNs with back propagation
deep learning
neural networks
notes
coursera
seq2seq
RNNs
LSTM
Author

Oren Bochman

Published

Sunday, September 3, 2017

{{< pdf lec8.pdf width="100%" class="ppSlide" >}}

Lecture 7b: Training RNNs with back propagation

Most important prerequisites to perhaps review: videos 3d and 5c (about backprop with weight sharing).

After watching the video, think about how such a system can be used to implement the brain of a robot as it’s producing a sentence of text, one letter at a time.

What would be input; what would be output; what would be the training signal; which units at which time slices would represent the input & output?

The equivalence between feedforward nets and recurrent nets

Assume that there is a time delay of 1 in using each connection.

The recurrent net is just a layered net that keeps reusing the same weights.

Reminder: Backpropagation with weight constraints

  • It is easy to modify the backprop algorithm to incorporate linear constraints between the weights.
  • We compute the gradients as usual, and then modify the gradients so that they satisfy the constraints.
    • So if the weights started off satisfying the constraints, they will continue to satisfy them.

Backpropagation through time

  • We can think of the recurrent net as a layered, feed-forward net with shared weights and then train the feed-forward net with weight constraints.
  • We can also think of this training algorithm in the time domain:
    • The forward pass builds up a stack of the activities of all the units at each time step.
    • The backward pass peels activities off the stack to compute the error derivatives at each time step.
    • After the backward pass we add together the derivatives at all the different times for each weight.

An irritating extra issue

  • We need to specify the initial activity state of all the hidden and output units.
  • We could just fix these initial states to have some default value like 0.5.
  • But it is better to treat the initial states as learned parameters.
  • We learn them in the same way as we learn the weights.
    • Start off with an initial random guess for the initial states.
    • At the end of each training sequence, backpropagate through time all the way to the initial states to get the gradient of the error function with respect to each initial state.
    • Adjust the initial states by following the negative gradient.

Providing input to recurrent networks

  • We can specify inputs in several ways:
    • Specify the initial states of all the units.
    • Specify the initial states of a subset of the units.
    • Specify the states of the same subset of the units at every time step.
  • This is the natural way to model most sequential data.

Teaching signals for recurrent networks

  • We can specify targets in several ways:
    • Specify desired final activities of all the units
    • Specify desired activities of all units for the last few steps
  • Good for learning attractors
  • It is easy to add in extra error derivatives as we backpropagate.
    • Specify the desired activity of a subset of the units.
  • The other units are input or hidden units.

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2017,
  author = {Bochman, Oren},
  title = {Deep {Neural} {Networks} - {Notes} for {Lesson} 7b},
  date = {2017-09-03},
  url = {https://orenbochman.github.io/notes/dnn/dnn-07/l07b.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2017. “Deep Neural Networks - Notes for Lesson 7b.” September 3, 2017. https://orenbochman.github.io/notes/dnn/dnn-07/l07b.html.