Glossary of terms in Deep leaning and ML

Accuracy: The fraction of predictions that a classification model got right.
activation: emphasizes that neuron like a real neuron may be on or off. In reality a negative bias will create a threshold to activation, otherwise, the neuron will always produce output. Also called [value] or [output].
activation function: The activation function is an attempt to mimic the biological neuron’s output in response to it input. This is generally a non-linear function. Some examples are RELU, Sigmoid, Tanh, Leaky RELU, Maxout and there are many others. All other things being equal RELU has emerged as the preferred activation function to start with.
AdaGrad: A gradient descent learning algorithm that re-scales the gradients of each parameter, effectively giving each parameter an independent learning rate. c.f. (Duchi, Hazan, and Singer 2011).
Anomaly detection: The process of identifying outliers that are considered candidates for removal from a dataset, Typically for being nonrepresentative high leverage points.
Attention: A mechanism that aggregate information from a set of inputs in a data-dependent manner. An attention mechanism might consist of a weighted sum over a set of inputs, where the weight for each input is computed by another part of the neural network.
Attribute: Synonym for feature.
Automation bias: When a human decision-maker favors recommendations made by an automated decision-making system over information made without automation, even when the automated decision-making system makes errors.
Backpropagation: The main algorithm for performing gradient descent on neural networks. First, the output values of each node are calculated (and cached) in a forward pass. Then, the partial derivative of the error with respect to each parameter is calculated in a backward pass through the graph.
Bagging: A method to train an ensemble where each constituent model trains on a random subset of training examples sampled with replacement. E.g. a random forest is a collection of decision trees trained with bagging. The term bagging is short for bootstrap aggregating.
Batch normalization: Normalizing the input or output of the activation functions in a hidden layer. Batch normalization increases a network’s stability by protecting against outlier weights, enable higher learning rates and reduce **overfitting`.
Batch size: The number of examples in a batch. For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference by GPU memory constraints. Some frameworks like TensorFlow allow using dynamic batch sizes.
Bias term: a term that allows for the identification of the neuron threshold as the weight on a special, constant input.
Bayesian neural network: A probabilistic neural network that accounts for uncertainty in weights and outputs. A Bayesian neural network relies on Bayes’ Theorem to calculate uncertainties in weights and predictions. A Bayesian neural network can be useful when it is important to quantify uncertainty, such as in models related to pharmaceuticals. Bayesian neural networks can also help prevent overfitting.
Bayesian optimization: A probabilistic regression model technique for optimizing computationally expensive objective functions by instead optimizing a surrogate that quantifies the uncertainty via a Bayesian learning technique. Since Bayesian optimization is itself very expensive, it is usually used to optimize expensive-to-evaluate tasks that have a small number of parameters, such as selecting hyperparameters.
Binning: synonym for bucketing
Boltzmann machine: an algorithm for learning the probability distribution on a set of inputs by means of weight changes using noisy responses.
Boosting: A machine learning technique that iteratively combines a set of simple and not very accurate classifiers (referred to as “weak” classifiers) into a classifier with high accuracy (a “strong” classifier) by upweighting the examples that the model is currently misclassifying.
bucketing: Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range. For example, instead of representing temperature as a single continuous floating-point feature, you could chop ranges of temperatures into discrete bins.
categorical: Features or columns in the data with a discrete set of possible values.
Connection weight: The parameter which is used to set the importance to an input coming to a given neuron from another one.
Delta rule: the simplest learning rule, in which weights are changed proportionally to the discrepancy between actual output and desired output.
Error surface: the surface in the weight space indicating how the error in the output of a neural network depends on these weights.
Feature: a column in a training case Feed-in; the number of inputs for a unit
Fan out: the amount of spread in output from a neuron.
Hebb learning law: modification of a connection weight proportional to the activities of the input and output neurons.
Hopfield network: a network with symmetric connection weights and thresholding of neural response.
Input: is ambiguous, because more often, input is short for **input neuron`.
Input unit: special neuron receiving only input activity which is fed on to the rest of the network.
Layer: a collection of neurons all of which receive input from a preceding set of neurons (or inputs), and send their outputs to other neurons or outside the net.
Learning law: rule for changing the connection weights in a neural network.
Learning rate: amount by which the connection weights change at each learning step.
Momentum: a term added to the weight change in back-propagation to achieve better learning by jumping out of local minima.
Neuron: a synonym for unit emphasizing the analogy with real brains.
Output: like value but emphasizing that it’s different from the input.
Parameter: the weights and biases learned by the network. Additional parameters - which are not necessarily learned or not directly part of the network are called hyperparameters
Recurrent neural network: one in which output activity is fed back into the input or hidden layers. Also called RNN Reinforcement training; modification of connection weights.
Test set: the set of input and output patterns used to test if a neural network has been trained effectively.
Training set: the set of input-output patterns provided to train the network.
Training case: a row in the dataset is the most commonly used and is quite generic. Also called input and training example Training example; emphasizes the analogy with human learning: we learn from examples.
Training point: emphasizes that it’s a location in a high-dimensional space.
Unit: a node in a neural network`. Nodes consists of an activation function, a weight, an input and output called the activation. The term unit emphasizes that it’s one component of a large network. Also referred to as a neuron** .
Value: a synonym for activation, referencing the output value of the activation function (RELU, sigmoid, tanh, etc.) when acting on its input.
Weight space: A high dimensional space with each dimension corresponding to the weight of a single neuron`. Weight space corresponds to the space of all possible weights. Each point in the space is a collection of weights and each training case can be represented as a hyper-plane** passing through the origin. See also error surface
loss function: emphasizes that we’re minimizing it, without saying much about what the meaning of the number is.
error function: emphasizes that it’s the extent to which the network gets things wrong.
objective function: is very generic. This is the only one where it’s not clear whether we’re minimizing or maximizing it.

References

Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12 (7).

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:

@online{bochman2017,
  author = {Bochman, Oren},
  title = {Glossary of Terms for {Deep} {Neural} {Networks}},
  date = {2017-08-06},
  url = {https://orenbochman.github.io/notes/dnn/dnn-glossery/glossary.html},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2017. “Glossary of Terms for Deep Neural Networks.” August 6, 2017. https://orenbochman.github.io/notes/dnn/dnn-glossery/glossary.html.