Appendix A — Appendix: Notation

Author

Oren Bochman

Parameters describe the population and are usually designated with Greek letters and the preferred letter is \theta

\theta \tag{A.1}

other common parameters are: \mu, \sigma^2, \alpha, \beta \tag{A.2}

where \mu is the mean, \sigma^2 is the variance, \alpha is the intercept, and \beta is the slope in a linear regression context.

Statistics are population estimates of parameters and are usually designated with Latin letters, such as \hat{\theta}.

\hat{p} \tag{A.3}

the Certain event

\Omega

probability of RV X taking value x

\mathbb{P}r(X=x)

odds

\mathcal{O}(X)

random variables X

X is distributed as

X \sim N(\mu, \sigma^2)

X is proportional to

X\propto N(\mu, \sigma^2)

Probability of A and B

\mathbb{P}r(X \cap Y)

Conditional probability

\mathbb{P}r(X \mid Y)

Joint probability

\mathbb{P}r(X,Y)

Y_i \stackrel{iid}\sim N(\mu, \sigma^2)

Approximately distributed as (say using the CLT)

Y_i \stackrel{.}\sim N(\mu, \sigma^2)

\mathbb{E}[X_i]

The expected value of an RV X set to 0 (A.K.A. a fair bet)

\mathbb{E}[X_i] \stackrel{set} = 0

The variance of an RV

\mathbb{V}ar[X_i]

logical implication

\implies

if and only if

\iff

therefore

\therefore

independence

A \perp \!\!\! \perp B \iff \mathbb{P}r(A \mid B) = \mathbb{P}r(A)

Indicator function \mathbb{I}_{\{A\}} = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{otherwise} \end{cases}

Dirichlet function

This is a continuous version of the indicator function, defined as a limit.

\delta(x) = \lim_{\varepsilon \to 0} \frac{1}{2\varepsilon} \mathbb{I}_{\{|x| < \varepsilon\}}

The Dirichlet function is used to represent a point mass at a point, often used as a component for zero inflated mixtures.

The following are from course 4

NoteTodo

A.1 The argmax function

When we want to maximize a function f(x), there are two things we may be interested in:

  1. The value f(x) achieves when it is maximized, which we denote \max_x f(x).
  2. The x-value that results in maximizing f(x), which we denote \hat x = \arg \max_x f(x). Thus \max_x f(x) = f(\hat x).

A.2 Indicator Functions

The concept of an indicator function is a really useful one. This is a function that takes the value one if its argument is true, and the value zero if its argument is false. Sometimes these functions are called Heaviside functions or unit step functions. I write an indicator function as \mathbb{I}_{A}(x), although sometimes they are written \mathbb{1}_{A}(x). If the context is obvious, we can also simply write I{A}.

Example:

\mathbb{I}_{x>3}(x)= \begin{cases} 0, & \text{if}\ x \le 3 \\ 1, & \text{otherwise} \end{cases}

Note: Indicator functions are easy to implement in code using a lambda function. They can be combined using a dictionary.

Because 0 · 1 = 0, the product of indicator functions can be combined into a single indicator function with a modified condition.

A.2.1 Products of Indicator Functions:

\mathbb{I}{x<5} \cdot \mathbb{I}{x≥0} = \mathbb{I}{0≤x<5}

\prod_{i=1}^N \mathbb{I}_{(x_i<2)} = \mathbb{I}_{(x_i<2) \forall i} = \mathbb{I}_{\max_i x_i<2}