Glossary of terms for Deep Neural Networks

course by Geffory Hinton on Coursa

Glossary of terms in Deep leaning and ML from Neural Networks for Machine Learning by Geoffrey Hintonon on Coursera
deep learning
glossary
notes
neural networks
Author

Oren Bochman

Published

Sunday, August 6, 2017

Glossary of terms in Deep leaning and ML

Accuracy
The fraction of predictions that a classification model got right.
activation
emphasizes that neuron like a real neuron may be on or off. In reality a negative bias will create a threshold to activation, otherwise, the neuron will always produce output. Also called [value] or [output].
activation function
The activation function is an attempt to mimic the biological neuron’s output in response to it input. This is generally a non-linear function. Some examples are RELU, Sigmoid, Tanh, Leaky RELU, Maxout and there are many others. All other things being equal RELU has emerged as the preferred activation function to start with.
AdaGrad
A gradient descent learning algorithm that re-scales the gradients of each parameter, effectively giving each parameter an independent learning rate. c.f. (Duchi, Hazan, and Singer 2011).
Anomaly detection
The process of identifying outliers that are considered candidates for removal from a dataset, Typically for being nonrepresentative high leverage points.
Attention
A mechanism that aggregate information from a set of inputs in a data-dependent manner. An attention mechanism might consist of a weighted sum over a set of inputs, where the weight for each input is computed by another part of the neural network.
Attribute
Synonym for feature.
Automation bias
When a human decision-maker favors recommendations made by an automated decision-making system over information made without automation, even when the automated decision-making system makes errors.
Backpropagation
The main algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.
Bagging
A method to train an ensemble where each constituent model trains on a random subset of training examples sampled with replacement. E.g. a random forest is a collection of decision trees trained with bagging. The term bagging is short for bootstrap aggregating.
Batch normalization
Normalizing the input or output of the activation functions in a hidden layer. Batch normalization increases a network’s stability by protecting against outlier weights, enable higher learning rates and reduce **overfitting`.
Batch size
The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference by GPU memory constraints. Some frameworks like TensorFlow allow using dynamic batch sizes.
Bias term
a term that allows for the identification of the neuron threshold as the weight on a special, constant input.
Bayesian neural network
A probabilistic neural network that accounts for uncertainty in weights and outputs. A Bayesian neural network relies on Bayes’ Theorem to calculate uncertainties in weights and predictions. A Bayesian neural network can be useful when it is important to quantify uncertainty, such as in models related to pharmaceuticals. Bayesian neural networks can also help prevent overfitting.
Bayesian optimization
A probabilistic regression model technique for optimizing computationally expensive objective functions by instead optimizing a surrogate that quantifies the uncertainty via a Bayesian learning technique. Since Bayesian optimization is itself very expensive, it is usually used to optimize expensive-to-evaluate tasks that have a small number of parameters, such as selecting hyperparameters.
Binning
synonym for bucketing
Boltzmann machine
an algorithm for learning the probability distribution on a set of inputs by means of weight changes using noisy responses.
Boosting
A machine learning technique that iteratively combines a set of simple and not very accurate classifiers (referred to as “weak” classifiers) into a classifier with high accuracy (a “strong” classifier) by upweighting the examples that the model is currently misclassifying.
bucketing
Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins.
categorical
Features or columns in the data with a discrete set of possible values.
Connection weight
The parameter which is used to set the importance to an input coming to a given neuron from another one.
Delta rule
the simplest learning rule, in which weights are changed proportionally to the discrepancy between actual output and desired output.
Error surface
the surface in the weight space indicating how the error in the output of a neural network depends on these weights.
Feature
a column in a training case Feed-in
the number of inputs for a unit
Fan out
the amount of spread in output from a neuron.
Hebb learning law
modification of a connection weight proportional to the activities of the input and output neurons.
Hopfield network
a network with symmetric connection weights and thresholding of neural response.
Input
is ambiguous, because more often, input is short for **input neuron`.
Input unit
special neuron receiving only input activity which is fed on to the rest of the network.
Layer
a collection of neurons all of which receive input from a preceding set of neurons (or inputs), and send their outputs to other neurons or outside the net.
Learning law
rule for changing the connection weights in a neural network.
Learning rate
amount by which the connection weights change at each learning step.
Momentum
a term added to the weight change in back-propagation to achieve better learning by jumping out of local minima.
Neuron
a synonym for unit emphasizing the analogy with real brains.
Output
like value but emphasizing that it’s different from the input.
Parameter
the weights and biases learned by the network. Additional parameters - which are not necessarily learned or not directly part of the network are called hyperparameters
Recurrent neural network
one in which output activity is fed back into the input or hidden layers. Also called RNN Reinforcement training
modification of connection weights.
Test set
the set of input and output patterns used to test if a neural network has been trained effectively.
Training set
the set of input-output patterns provided to train the network.
Training case
a row in the dataset is the most commonly used and is quite generic. Also called input and training example Training example
emphasizes the analogy with human learning: we learn from examples.
Training point
emphasizes that it’s a location in a high-dimensional space.
Unit
a node in a neural network`. Nodes consists of an activation function, a weight, an input and output called the activation. The term unit emphasizes that it’s one component of a large network. Also referred to as a neuron** .
Value
a synonym for activation, referencing the output value of the activation function (RELU, sigmoid, tanh, etc.) when acting on its input.
Weight space
A high dimensional space with each dimension corresponding to the weight of a single neuron`. Weight space corresponds to the space of all possible weights. Each point in the space is a collection of weights and each training case can be represented as a hyper-plane** passing through the origin. See also error surface
loss function
emphasizes that we’re minimizing it, without saying much about what the meaning of the number is.
error function
emphasizes that it’s the extent to which the network gets things wrong.
objective function
is very generic. This is the only one where it’s not clear whether we’re minimizing or maximizing it.

References

Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12 (7).

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2017,
  author = {Bochman, Oren},
  title = {Glossary of Terms for {Deep} {Neural} {Networks}},
  date = {2017-08-06},
  url = {https://orenbochman.github.io/notes/dnn/dnn-glossery/glossary.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2017. “Glossary of Terms for Deep Neural Networks.” August 6, 2017. https://orenbochman.github.io/notes/dnn/dnn-glossery/glossary.html.