H.1 Link functions
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
The link function provides the relationship between the linear predictor and the mean of the distribution function. There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined canonical link function which is derived from the exponential of the response’s density function. However, in some cases, it makes sense to try to match the domain of the link function to the range of the distribution function’s mean, or use a non-canonical link function for algorithmic purposes, for example Bayesian probit regression.
When using a distribution function with a canonical parameter \theta , the canonical link function is the function that expresses \theta in terms of \mu i.e. \theta =b(\mu) . For the most common distributions, the mean μ is one of the parameters in the standard form of the distribution’s Density function, and then b(\mu ) is the function as defined above that maps the density function into its canonical form. When using the canonical link function, b(\mu )=\theta =\mathbf {X} {\boldsymbol {\beta }} , which allows \mathbf{X}^{T}\mathbf{Y} to be a sufficient statistic for \beta .
Following is a table of several exponential-family distributions in common use and the data they are typically used for, along with the canonical link functions and their inverses (sometimes referred to as the mean function, as done here).
Distribution | Support | Uses | Link name | Link fn | Mean fn |
---|---|---|---|---|---|
Normal | \mathbb{R} | Identity | \mathbf {X} {\boldsymbol {\beta }}=\mu | ||
Expoential | \mathbb{R}^+_0 | Negative inverse | \mathbf {X} {\boldsymbol {\beta }}=-\mu^{-1} | ||
Gamma | \mathbb{R}^+_0 | Negative inverse | \mathbf {X} {\boldsymbol {\beta }}=-\mu^{-1} | ||
Inverse Gamma | \mathbb{R}^+_0 | Inverse squared | \mathbf {X} {\boldsymbol {\beta }}=\mu^{-2} | ||
Poisson | \mathbb{N}_0 | Log | \mathbf {X} {\boldsymbol {\beta }}=ln(\mu) | ||
Bernuolli | \{0,1\} | Logit | \mathbf {X} {\boldsymbol {\beta }}=ln(\frac{\mu}{1-\mu}) | ||
Binomial | Logit | \mathbf {X} {\boldsymbol {\beta }}=ln(\frac{\mu}{n-\mu}) | |||
Categorical | Logit | \mathbf {X} {\boldsymbol {\beta }}=ln(\frac{\mu}{1-\mu}) | |||
Multinomial | Logit | \mathbf {X} {\boldsymbol {\beta }}=ln(\frac{\mu}{1-\mu}) |
In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. In particular, the linear predictor may be positive, which would give an impossible negative mean. When maximizing the likelihood, precautions must be taken to avoid this. An alternative is to use a non-canonical link function.
In the case of the Bernoulli, binomial, categorical and multinomial distributions, the support of the distributions is not the same type of data as the parameter being predicted. In all of these cases, the predicted parameter is one or more probabilities, i.e. real numbers in the range [0,1]. The resulting model is known as Logistic regression (or Multinomial logistic regression in the case that K-way rather than binary values are being predicted).
For the Bernoulli and binomial distributions, the parameter is a single probability, indicating the likelihood of occurrence of a single event. The Bernoulli still satisfies the basic condition of the generalized linear model in that, even though a single outcome will always be either 0 or 1, the expected value will nonetheless be a real-valued probability, i.e. the probability of occurrence of a “yes” (or 1) outcome. Similarly, in a binomial distribution, the expected value is Np, i.e. the expected proportion of “yes” outcomes will be the probability to be predicted.
For categorical and multinomial distributions, the parameter to be predicted is a K-vector of probabilities, with the further restriction that all probabilities must add up to 1. Each probability indicates the likelihood of occurrence of one of the K possible values. For the multinomial distribution, and for the vector form of the categorical distribution, the expected values of the elements of the vector can be related to the predicted probabilities similarly to the binomial and Bernoulli distributions.
Credits:
This page is based on the Generalized linear model article on Wikipedia, which is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. By Wikimedia contributors, available under CC BY-SA 3.0.
The text has been modified for clarity and conciseness.