Lecture 11a: Hopfield Nets
Now, we leave behind the feedforward deterministic networks that are trained with backpropagation gradients. We’re going to see quite a variety of different neural networks now. These networks do not have output units. These networks have units that can only be in states 0 and 1. These networks do not have units of which the state is simply a function of the state of other units. These networks are, instead, governed by an “energy function”. Best way to really understand Hopfield networks: Go through the example of the Hopfield network finding a low energy state, by yourself. Better yet, think of different weights, and do the exercise with those. Typically, we’ll use Hopfield networks where the units have state 0 or 1; not -1 or 1.
Lecture 11b: Dealing with spurious minima
The last in-video question is not easy. Try to understand how the perceptron learning procedure is used in a Hopfield net; it’s not very thoroughly explained.
Lecture 11d: Using stochastic units to improve search
We’re still working with a mountain landscape analogy. This time, however, it’s not an analogy for parameter space, but for state space. A particle is, therefore, not a weight vector, but a configuration. What’s the same is that we’re, in a way, looking for low points in the landscape. We’re also using the physics analogy of systems that can be in different states, each with their own energy, and subject to a temperature. This analogy is introduced in slide 2. This is the analogy that originally inspired Hopfield networks. The idea is that at a high temperature, the system is more inclined to transition into configurations with high energy, even though it still prefers low energy.
3:25: “the amount of noise” means the extent to which the decisions are random. 4:20: If T really were 0, we’d have division by zero, which is not good. What we really mean here is “as T gets really, really small (but still positive)”. For mathematicians: it’s the limit as T goes to zero from above. Thermal equilibrium, and this whole random process of exploring states, is much like the exploration of weight vectors that we can use in Bayesian methods. It’s called a Markov Chain, in both cases.
Lecture 11e: How a Boltzmann machine models data
Now, we’re making a generative model of binary vectors. In contrast, mixtures of Gaussians are a generative model of real-valued vectors. 4:38: Try to understand how a mixture of Gaussians is also a causal generative model. 4:58: A Boltzmann Machine is an energy-based generative model. 5:50: Notice how this is the same as the earlier definition of energy. What’s new is that it’s mentioning visible and hidden units separately, instead of treating all units the same way.
References
Reuse
Citation
@online{bochman2017,
author = {Bochman, Oren},
title = {Deep {Neural} {Networks} - {Notes} for {Lesson} 11},
date = {2017-10-21},
url = {https://orenbochman.github.io/notes/dnn/dnn-11/l_11.html},
langid = {en}
}