Deep Neural Networks - Notes for lecture 2d

For the course by Geoffrey Hinton on Coursera

Notes for Deep learning focusing on why the learning works?
deep learning
neural networks
notes
coursera
Author

Oren Bochman

Published

Thursday, July 20, 2017

{{< pdf lec2.pdf class="column-margin" >}}

In this lecture we try to explain intuitively why the perceptron algorithm works

Also we consider why it may fail.

Lecture 2d: Why the learning works?

Why the learning procedure works (first attempt)

idea of the proof

idea of the proof

we want to get closer all the feasible solution

Consider the squared distance d_a^2+d_b^2 between any feasible weight vector and the current weight vector. – Hopeful claim: Every time the perceptron makes a mistake, the learning algorithm moves the current weight vector closer to all feasible weight vectors

We look at the geometrical interpretation which is the proof for the convergence of the Perceptron learning algorithm works. We are trying to find a decision surface by solving a convex optimization problem.

The surface is a hyper-plane represented by a line where on side is the correct set and the other is incorrect. The weight vectors form a cone:

  • This means that wights are closed under addition and positive scaler product.
  • At zero it is zero.

fixing it up

fixing it up

we now use a generously feasible sub-cone of the feasible cone shown in a dotted line

  • So consider “generously feasible” weight vectors that lie within the feasible region by a margin at least as great as the length of the input vector that defines each constraint plane.
    • Every time the perceptron makes a mistake, the squared distance to all of these generously feasible weight vectors is always decreased by at least the squared length of the update vector.

Informal sketch of proof of convergence

  • Each time the perceptron makes a mistake, the current weight vector moves to decrease its squared distance from every weight vector in the “generously feasible” region.
  • The squared distance decreases by at least the squared length of the input vector.
  • So after a finite number of mistakes, the weight vector must lie in the feasible region if this region exists.

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2017,
  author = {Bochman, Oren},
  title = {Deep {Neural} {Networks} - {Notes} for Lecture 2d},
  date = {2017-07-20},
  url = {https://orenbochman.github.io/notes/dnn/dnn-02/l02d.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2017. “Deep Neural Networks - Notes for Lecture 2d.” July 20, 2017. https://orenbochman.github.io/notes/dnn/dnn-02/l02d.html.