Statistical Methods for Machine Learning
Larry Wasserman’s course
course review
Review
- Statistics versus ML
- Concentration
- Probability
- Basic Statistics
- Minimaxity
- Regression
- Classification
Theorem 1 (Hoeffding’s inequality) If Z_1, Z_2, \ldots, Z_n are iid with mean \mu and P(a \le Z_i \le b) = 1, then for any \epsilon > 0, P(|Z_n − \mu| > \epsilon) \le 2e^{-2n\epsilon^2/(b-a)^2}, where Z_n = \frac{1}{n}\sum_{i=1}^n Z_i.
Definition 1 (VC Dimension) Let A be a class of sets. If F is a finite set, let s(A, F) be the number of subset of F ‘picked out’ by A.
Define the growth function: s_n(A) = \sup_{|F|=n} s(A, F).
Note that s_n(A) \le 2^n.
The VC dimension of a class of set A is VC(A) = \sup\{n : s_n(A) = 2^n\}. \qquad \tag{1}
If the VC dimension is finite, then there is a phase transition in the growth function from exponential to polynomial:
Theorem 2 (Sauer’s Theorem) Suppose that A has finite VC dimension d. Then, \forall n \ge d, s(A, n) \le( \frac{en}{d})^d. \qquad
Given data Z_1, \ldots, Z_n \sim P. The empirical measure Pn is Pn(A) = \frac{1}{n}\sum_{i=1}^n I(Z_i \in A).
Theorem 3 (Vapnik and Chervonenkis) Let A be a class of sets. For any t > \sqrt{2/n}, P(\sup_{A \in A} |P_n(A) − P(A)| > t) \le 4 s(A, 2n) e^{-nt^2/8} (4) and hence, with probability at least 1 − \delta, \sup_{A \in A} |P_n(A) − P(A)| \le \sqrt{8/n \log( 4 s(A, 2n)/\delta)} (5). Hence, if A has finite VC dimension d then \sup_{A \in A} |P_n(A) − P(A)| \le \sqrt{8/n (\log( 4/\delta)+ d \log(n e/d))} \qquad
Theorem 4 Bernstein’s inequality is a more refined inequality than Hoeffding’s inequality. It is especially useful when the variance of Y is small.
Suppose that Y_1, \ldots, Y_n are iid with mean \mu, \text{Var}(Y_i) \le \sigma^2 and |Y_i| \le M. Then P(|Y - \mu| > \epsilon) \le 2 \exp\{- n \epsilon^2 / (2\sigma^2 + 2M \epsilon / 3)\} \qquad
It follows that P(|Y - \mu| > t \sqrt{n} \epsilon + \epsilon \sigma^2 (1 - c)) \le e^{-t} for small enough \epsilon and c.
Density Estimation

Nonparametric Regression
https://www.stat.cmu.edu/~larry/=sml/nonpar2019.pdf
Lecture 03: Concentration of Measure
Lecture 04: Concentration of Measure
Concentration of Measure
Lecture 05: Concentration of Measure
Linear Regression
Nonparametric Regression
Nonparametric Regression - Ryan Tibshirani and Larry Wasserman
Lecture 06: Linear Regression and Nonparametric Regression
A Closer Look at Sparse Regression Ryan Tibshirani
Lecture 07: Nonparametric Regression
A Closer Look at Sparse Regression Ryan Tibshirani
Lecture 08: Trend filtering
A Closer Look at Sparse Regression Ryan Tibshirani
Lecture 09: Linear Classification
Lecture 10: Nonparametric Classification
Lecture 11: Minimax Theory
Lecture 12: Minimax Theory

Lecture 13: Nonparametric Bayes
Lecture 14: Boosting
Lecture 15: Clustering
Lecture 16: Clustering
Lecture 17: Clustering
Lecture 18: Graphical Models
Lecture 19: Graphical Models
Lecture 20: Graphical Models
Lecture 21: Graphical Models
Lecture 22: Dimension Reduction
Lecture 23: Dimension Reduction & Random Matrix Theory
Lecture 24: Random Matrix Theory & Differential Privacy