Deep Neural Networks - Notes From Hinton’s Course

Author

Oren Bochman

Published

Sunday, August 6, 2017

Dropout

My thoughts are that we should be able to do better than this version of dropout. - Shortcoming: - Dropout on units can render the net very poor. - Drop out slows training down - since we don’t update half the units and probably a large number of the weights. - For different networks (CNN, RNN, etc) drop out might work better on units that correspond to larger structures. - We should track dropout related stats to better understand the confidence of the model. - A second idea is that the gated network of expert used a neural network to assign each network to its expert. If we want the network to make better use of its capacity, perahps we should introduce some correlation between the dropout nodes and the data. Could we develop a gated dropout? 1. Start with some combinations \binom k n of the weights. where k = | {training\; set}|*{minibatch\_size}. We use the same dropout for each mini-batch, then switch. 2. Each epoch we should try to switch our mini-batches. We may want to start with maximally heterogenous batches. We may want in subsequent epochs to pick more heterogenous batches. We should do this by shuffling the batches. We might want to shuffle by taking out a portion of the mini-batch inversely proportional to its error rate, shuffle and return. So that the worst mini-batches would get changed more often. We could ? 3. When we switch we can shuffle different We score the errors per mini-batch dropout combo and try to reduce the error by shuffling between all mini-batches with similar error rates. The lower the error the smaller the shuffles. In each epoch we want to assign to each combination a net. 4. Ideally we would like learn how to gate training cases to specific dropouts or to dropout that are within certain symmetry groups of some known dropouts. (i.e. related/between a large number of dropout-combos.). In the “full bayesian learning” we may want to learn a posterior distribution To build a correlation matrix between the training case and the dropout combo. If there was a structure like an orthogonal array for each we might be able to collect this kind of data in a minimal set of step. 5. We could use abstract algebra e.g. group theory to design a network/dropout/mini-batching symmetry mechanism. 6. We should construct a mini-batch shuffle group and a drop out group or a ring. We could also select an architecture that makes sense for the

Further Reading

Y. Gal and Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning ## Data Augmention My Idea: Signal Boosting -A Bayesian/RL approach to augmentation: We start with an initial dataset which we want to improve. 1. We could apply a number of augmentation to each image. 2. use our web spider fetch more images to created an extending the data set - as this would requires manually accepting the new images it has a cost. 3. We could put use an class augmentation weight to reduce class imbalance. This way we can augment smaller classes more than big ones. 4. We could use a normalized misclassification rate for training cases as probability of using/generating augmentation. 5. How do we manage this kind of extended dataset? We should use a generator that keeps track of the augmentation hierarchy. The labels and their distribution and the misclassification rates . The master training_case

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2017,
  author = {Bochman, Oren},
  title = {Deep {Neural} {Networks} - {Notes} {From} {Hinton’s}
    {Course}},
  date = {2017-08-06},
  url = {https://orenbochman.github.io/notes/dnn/dnn-dropout/2017-08-06-dropout.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2017. “Deep Neural Networks - Notes From Hinton’s Course.” August 6, 2017. https://orenbochman.github.io/notes/dnn/dnn-dropout/2017-08-06-dropout.html.