116  Dirichlet Process

Bayesian Statistics - Nonparametric Methods

Dirichlet Processes Tutorial
Bayesian Statistics
Nonparametric Methods
Author

Oren Bochman

Published

July 2, 2025

Keywords

Dirichlet Processes

NoteCredit

the following material is based on the Gaussian Processes for Regression and tutorial and code by Tamara Broderick. Note that the code is under the MIT license

116.1 Nonparametric Bayes

  • Bayesian statistics that is not parametric

  • Bayesian: \mathbb{P}r(parameters \mid data) \propto \mathbb{P}r(data \mid parameters) \mathbb{P}r(parameters)

  • Not parametric (i.e. not finite parameter, unbounded/growing/infinite number of parameters)

  • examples:

    • Wikipedia articles
    • Species
    • density estimation [Escobar, West 1995; Ghosal et al 1999]
    • survival analysis curves
    • Fitness exercises [Fox et al 2014]
    • Genetics [Ewens 1972; Hartl, Clark 2003]
    • Newborn babies [Saria et al 2010]
    • Social networks [Llyod et all 2012; Miller et al 2010]
    • Images [Sudderth, Jordan 2009]

116.2 Nonparametric Bayes

  • A theoretical motivation: De Finetti’s Theorem
  • A data sequence is infinitely exchangeable if the distribution of any N data points doesn’t change when permuted:

p(X_1, \ldots , X_N ) = p(X_{\sigma(1)} , \ldots , X_{\sigma(N)} )

  • De Finetti’s Theorem (roughly): A sequence is infinitely exchangeable \iff, for all N and some distribution P p(X_1, \ldots , X_N ) = \int_\theta \prod_{}^N p(X_n\mid\theta)P(d\theta)

  • Motivates:

    • Parameters and likelihoods
    • Priors
    • “Nonparametric Bayesian” priors
  • Note: that De Finetti’s proved his theorem in 1931 but for finite exchangeability.

  • In (Hewitt and Savage 1955) Savage and Hewitt extended the result from finite to infinite exchangeability in 1955.

  • There were also a few other related results by Diaconis and Freedman in the 1970s.

  • In (Aldous 1983) Aldous proved a more general for arrays 1983.

116.3 Generative Model

\mathbb{P}r(parameters \mid data) \propto \mathbb{P}r(data \mid parameters) \mathbb{P}r(parameters)

  • Finite Gaussian mixture model (K=2 clusters) z_n \stackrel{iid}{\sim} \text{Categorical}(\rho_1, \rho_2)

x_n \stackrel{indep}{\sim} \mathcal{N}(\mu_0, \Sigma)

  • Don’t know \mu_1 , \mu_2 \mu_k \stackrel{iid}{\sim} \mathcal{N}(\mu_0, \sigma_0^2) \quad k=1,2
  • Don’t know \rho_1 , \rho_2 \rho_1 \sim \text{Beta}(\alpha_0, \beta_0)

\rho_2 = 1 - \rho_1

Inference goal: assignments of data points to clusters, cluster parameters

116.4 Beta distribution review

\text{Beta}(\rho \mid \alpha_1, \alpha_2) = \frac{\Gamma(\alpha_1 + \alpha_2)}{\Gamma(\alpha_1)\Gamma(\alpha_2)} \rho^{\alpha_1 - 1} (1 - \rho)^{\alpha_2 - 1}

  • \alpha_1, \alpha_2 > 0

  • \rho \in [0,1]

  • Gamma function \Gamma

  • integer m: \Gamma(m+1) = m!

  • for x > 0: \Gamma(x+1) = x\Gamma(x)

  • What happens?

    • a = a_1 = a_2 \to 0
    • a = a_1 = a_2 \to \infty
    • a_1 > a_2
  • Beta is conjugate to Cat

\rho \sim \text{Beta}(\alpha_1, \alpha_2),\qquad z \sim \text{Cat}(\rho_1, \rho_2)

p(\rho_1 , z) \propto \rho_1^{\mathbb{1}_{z=1}} (1 - \rho_1 )^{\mathbb{1}_{z=2}} \rho_1^{a_1 -1} (1 - \rho_1 )^{a_2 -1}

p(\rho_1 , z) \propto \rho_1^{\mathbb{1}_{z=1-1}} (1 - \rho_1 )^{\mathbb{1}_{z=2}-1} \propto \text{Beta}(\rho_1 \mid a_1 + \mathbb{1}_{z=1}, a_2 + \mathbb{1}_{z=2})