Appendix A — Appendix: Notation – Bayesian Statistics

Parameters describe the population and are usually designated with Greek letters and the preferred letter is \theta

\theta \tag{A.1}

other common parameters are: \mu, \sigma^2, \alpha, \beta \tag{A.2}

where \mu is the mean, \sigma^2 is the variance, \alpha is the intercept, and \beta is the slope in a linear regression context.

Statistics are population estimates of parameters and are usually designated with Latin letters, such as \hat{\theta}.

\hat{p} \tag{A.3}

the Certain event

\Omega

probability of RV X taking value x

\mathbb{P}r(X=x)

odds

\mathcal{O}(X)

random variables X

X is distributed as

X \sim N(\mu, \sigma^2)

X is proportional to

X\propto N(\mu, \sigma^2)

Probability of A and B

\mathbb{P}r(X \cap Y)

Conditional probability

\mathbb{P}r(X \mid Y)

Joint probability

\mathbb{P}r(X,Y)

Y_i \stackrel{iid}\sim N(\mu, \sigma^2)

Approximately distributed as (say using the CLT)

Y_i \stackrel{.}\sim N(\mu, \sigma^2)

\mathbb{E}[X_i]

The expected value of an RV X set to 0 (A.K.A. a fair bet)

\mathbb{E}[X_i] \stackrel{set} = 0

The variance of an RV

\mathbb{V}ar[X_i]

logical implication

\implies

if and only if

\iff

therefore

\therefore

independence

A \perp \!\!\! \perp B \iff \mathbb{P}r(A \mid B) = \mathbb{P}r(A)

Indicator function \mathbb{I}_{\{A\}} = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{otherwise} \end{cases}

Dirichlet function

This is a continuous version of the indicator function, defined as a limit.

\delta(x) = \lim_{\varepsilon \to 0} \frac{1}{2\varepsilon} \mathbb{I}_{\{|x| < \varepsilon\}}

The Dirichlet function is used to represent a point mass at a point, often used as a component for zero inflated mixtures.

The following are from course 4

\{y_t\} - the time series process, where each y_t is a univariate random variable and t are the time points that are equally spaced.
y_{1:T} or y_1, y_2, \ldots, y_T - the observed data.
You will see the use of ’ to denote the transpose of a matrix,
and the use of \sim to denote a distribution.
under tildes \utilde{y} are used to denote estimates of the true values y.
E matrix of eigenvalues
\Lambda = diagonal(\alpha_1, \alpha_2, \ldots , \alpha_p) is a diagonal matrix with the eigenvalues of Σ on the diagonal.
J_p(1) = a p by p Jordan form matrix with 1 on the superdiagonal

A.1 The argmax function

When we want to maximize a function f(x), there are two things we may be interested in:

The value f(x) achieves when it is maximized, which we denote \max_x f(x).
The x-value that results in maximizing f(x), which we denote \hat x = \arg \max_x f(x). Thus \max_x f(x) = f(\hat x).

A.2 Indicator Functions

The concept of an indicator function is a really useful one. This is a function that takes the value one if its argument is true, and the value zero if its argument is false. Sometimes these functions are called Heaviside functions or unit step functions. I write an indicator function as \mathbb{I}_{A}(x), although sometimes they are written \mathbb{1}_{A}(x). If the context is obvious, we can also simply write I{A}.

Example:

\mathbb{I}_{x>3}(x)= \begin{cases} 0, & \text{if}\ x \le 3 \\ 1, & \text{otherwise} \end{cases}

Note: Indicator functions are easy to implement in code using a lambda function. They can be combined using a dictionary.

Because 0 · 1 = 0, the product of indicator functions can be combined into a single indicator function with a modified condition.

A.2.1 Products of Indicator Functions:

\mathbb{I}{x<5} \cdot \mathbb{I}{x≥0} = \mathbb{I}{0≤x<5}

\prod_{i=1}^N \mathbb{I}_{(x_i<2)} = \mathbb{I}_{(x_i<2) \forall i} = \mathbb{I}_{\max_i x_i<2}

--- title: "Appendix: Notation" --- Parameters describe the population and are usually designated with Greek letters and the preferred letter is $\theta$ $$ \theta $$ {#eq-notation-parameters} other common parameters are: $$ \mu, \sigma^2, \alpha, \beta $$ {#eq-notation-parameters-other} where $\mu$ is the mean, $\sigma^2$ is the variance, $\alpha$ is the intercept, and $\beta$ is the slope in a linear regression context. Statistics are population estimates of parameters and are usually designated with Latin letters, such as $\hat{\theta}$. $$ \hat{p} $$ {#eq-notation-estimate} the Certain event $$ \Omega $$ probability of RV $X$ taking value $x$ $$ \mathbb{P}r(X=x) $$ odds $$ \mathcal{O}(X) $$ random variables $$ X $$ X is distributed as $$X \sim N(\mu, \sigma^2) $$ X is proportional to $$ X\propto N(\mu, \sigma^2) $$ Probability of A and B $$ \mathbb{P}r(X \cap Y) $$ Conditional probability $$ \mathbb{P}r(X \mid Y) $$ Joint probability $$ \mathbb{P}r(X,Y) $$ $$ Y_i \stackrel{iid}\sim N(\mu, \sigma^2) $$ Approximately distributed as (say using the CLT) $$ Y_i \stackrel{.}\sim N(\mu, \sigma^2) $$ $$ \mathbb{E}[X_i] $$ The **expected value** of an RV X set to 0 (A.K.A. a fair bet) $$ \mathbb{E}[X_i] \stackrel{set} = 0 $$ The variance of an RV $$ \mathbb{V}ar[X_i] $$ logical implication $$ \implies $$ if and only if $$ \iff $$ therefore $$ \therefore $$ independence $$ A \perp \!\!\! \perp B \iff \mathbb{P}r(A \mid B) = \mathbb{P}r(A) $$ Indicator function $$ \mathbb{I}_{\{A\}} = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{otherwise} \end{cases} $$ Dirichlet function This is a continuous version of the indicator function, defined as a limit. $$ \delta(x) = \lim_{\varepsilon \to 0} \frac{1}{2\varepsilon} \mathbb{I}_{\{|x| < \varepsilon\}} $$ The Dirichlet function is used to represent a point mass at a point, often used as a component for zero inflated mixtures. The following are from course 4 - $\{y_t\}$ - the time series process, where each $y_t$ is a univariate random variable and t are the time points that are equally spaced. - $y_{1:T}$ or $y_1, y_2, \ldots, y_T$ - the observed data. - You will see the use of ' to denote the transpose of a matrix, - and the use of $\sim$ to denote a distribution. - under tildes $\utilde{y}$ are used to denote estimates of the true values $y$. - E matrix of eigenvalues - $\Lambda = diagonal(\alpha_1, \alpha_2, \ldots , \alpha_p)$ is a diagonal matrix with the eigenvalues of Σ on the diagonal. - $J_p(1)$ = a p by p Jordan form matrix with 1 on the superdiagonal ::: {.callout-note collapse="TRUE" } ## Todo - [ ] F the event space - [ ] Likelihood - [ ] LogLikelihood - [ ] MLE Maximum likelihood estimator - [ ] MAP Maximum a posteriori estimate ::: ## The argmax function {#sec-argmax-function} When we want to maximize a function $f(x)$, there are two things we may be interested in: 1. The value $f(x)$ achieves when it is maximized, which we denote $\max_x f(x)$. 2. The x-value that results in maximizing $f(x)$, which we denote $\hat x = \arg \max_x f(x)$. Thus $\max_x f(x) = f(\hat x)$. ## Indicator Functions {#sec-indicator-functions} The concept of an indicator function is a really useful one. This is a function that takes the value one if its argument is true, and the value zero if its argument is false. Sometimes these functions are called Heaviside functions or unit step functions. I write an indicator function as $\mathbb{I}_{A}(x)$, although sometimes they are written $\mathbb{1}_{A}(x)$. If the context is obvious, we can also simply write $I{A}$. Example: $$ \mathbb{I}_{x>3}(x)= \begin{cases} 0, & \text{if}\ x \le 3 \\ 1, & \text{otherwise} \end{cases} $$ Note: Indicator functions are easy to implement in code using a `lambda function`. They can be combined using a dictionary. Because 0 · 1 = 0, the product of indicator functions can be combined into a single indicator function with a modified condition. ::: exp-indicator-products ### Products of Indicator Functions: $$ \mathbb{I}{x<5} \cdot \mathbb{I}{x≥0} = \mathbb{I}{0≤x<5} $$ $$ \prod_{i=1}^N \mathbb{I}_{(x_i<2)} = \mathbb{I}_{(x_i<2) \forall i} = \mathbb{I}_{\max_i x_i<2} $$ :::