Parameters describe the population and are usually designated with Greek letters and the preferred letter is \theta
\theta \tag{A.1}
other common parameters are: \mu, \sigma^2, \alpha, \beta \tag{A.2}
where \mu is the mean, \sigma^2 is the variance, \alpha is the intercept, and \beta is the slope in a linear regression context.
Statistics are population estimates of parameters and are usually designated with Latin letters, such as \hat{\theta}.
\hat{p} \tag{A.3}
the Certain event
\Omega
probability of RV X taking value x
\mathbb{P}r(X=x)
odds
\mathcal{O}(X)
random variables X
X is distributed as
X \sim N(\mu, \sigma^2)
X is proportional to
X\propto N(\mu, \sigma^2)
Probability of A and B
\mathbb{P}r(X \cap Y)
Conditional probability
\mathbb{P}r(X \mid Y)
Joint probability
\mathbb{P}r(X,Y)
Y_i \stackrel{iid}\sim N(\mu, \sigma^2)
Approximately distributed as (say using the CLT)
Y_i \stackrel{.}\sim N(\mu, \sigma^2)
\mathbb{E}[X_i]
The expected value of an RV X set to 0 (A.K.A. a fair bet)
\mathbb{E}[X_i] \stackrel{set} = 0
The variance of an RV
\mathbb{V}ar[X_i]
logical implication
\implies
if and only if
\iff
therefore
\therefore
independence
A \perp \!\!\! \perp B \iff \mathbb{P}r(A \mid B) = \mathbb{P}r(A)
Indicator function \mathbb{I}_{\{A\}} = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{otherwise} \end{cases}
Dirichlet function
This is a continuous version of the indicator function, defined as a limit.
\delta(x) = \lim_{\varepsilon \to 0} \frac{1}{2\varepsilon} \mathbb{I}_{\{|x| < \varepsilon\}}
The Dirichlet function is used to represent a point mass at a point, often used as a component for zero inflated mixtures.
The following are from course 4
- \{y_t\} - the time series process, where each y_t is a univariate random variable and t are the time points that are equally spaced.
- y_{1:T} or y_1, y_2, \ldots, y_T - the observed data.
- You will see the use of ’ to denote the transpose of a matrix,
- and the use of \sim to denote a distribution.
- under tildes \utilde{y} are used to denote estimates of the true values y.
- E matrix of eigenvalues
- \Lambda = diagonal(\alpha_1, \alpha_2, \ldots , \alpha_p) is a diagonal matrix with the eigenvalues of Σ on the diagonal.
- J_p(1) = a p by p Jordan form matrix with 1 on the superdiagonal
A.1 The argmax function
When we want to maximize a function f(x), there are two things we may be interested in:
- The value f(x) achieves when it is maximized, which we denote \max_x f(x).
- The x-value that results in maximizing f(x), which we denote \hat x = \arg \max_x f(x). Thus \max_x f(x) = f(\hat x).
A.2 Indicator Functions
The concept of an indicator function is a really useful one. This is a function that takes the value one if its argument is true, and the value zero if its argument is false. Sometimes these functions are called Heaviside functions or unit step functions. I write an indicator function as \mathbb{I}_{A}(x), although sometimes they are written \mathbb{1}_{A}(x). If the context is obvious, we can also simply write I{A}.
Example:
\mathbb{I}_{x>3}(x)= \begin{cases} 0, & \text{if}\ x \le 3 \\ 1, & \text{otherwise} \end{cases}
Note: Indicator functions are easy to implement in code using a lambda function
. They can be combined using a dictionary.
Because 0 · 1 = 0, the product of indicator functions can be combined into a single indicator function with a modified condition.
A.2.1 Products of Indicator Functions:
\mathbb{I}{x<5} \cdot \mathbb{I}{x≥0} = \mathbb{I}{0≤x<5}
\prod_{i=1}^N \mathbb{I}_{(x_i<2)} = \mathbb{I}_{(x_i<2) \forall i} = \mathbb{I}_{\max_i x_i<2}