117  Dirichlet Process

Bayesian Statistics - Nonparametric Methods

Dirichlet Processes Tutorial
Bayesian Statistics
Nonparametric Methods
Author

Oren Bochman

Published

July 2, 2025

Keywords

Dirichlet Processes

117.1 Dirichlet distribution Review

Although we have looked at the Dirichlet distribution in the context of Bayesian Mixture Models we will now dive deeper into this distribution and its properties. BNPs teach us that the Dirichlet distribution is a conjugate prior for the categorical distribution. And to a large extent is the basic building block of non-parametric Bayesian methods.

Once you understand the Dirichlet distribution, you should be able to understand 50% of the non-parametric Bayesian methods used in research today. I found that this lesson does for me what early differential equations did for me in my undergraduate studies. It is a key concept that opens the door to understanding many other concepts. We will see that the Dirichlet distribution is a generalization of the Beta distribution and that it is also a stepping stone to the Chinese Restaurant Process.

We start with the Dirichlet distribution, which is a distribution over a simplex. A simplex is a generalization of a triangle to higher dimensions. For example, in 2D, a simplex is a triangle, and in 3D, it is a tetrahedron. Why a simplex? Because we want to model probabilities, and probabilities must sum to 1. A simplex is a set of points that satisfy this constraint.

Here is the pdf for the Dirichlet distribution:

\text{Dirichlet}(\rho \mid \alpha_1, \ldots , \alpha_K) = \frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K \rho_k^{\alpha_k - 1} \qquad a_k>0 \tag{117.1}

  • where \rho_k is the k-th component of the vector \rho and
  • \alpha_k is the concentration parameter for the k-th component.
  • \Gamma is the gamma function, which is a generalization of the factorial function. Recall that \Gamma(n) = (n-1)! for positive integers n.

The concentration parameters control how concentrated the distribution is around the mean.

Now we should develop some intuition about the Dirichlet distribution with respect to its parameters.

  • what happen when a=a_1=a_K=1?
  • what happen when a=a_1=a_K \to 0?
  • what happen when a=a_1=a_K \to \infty?

\rho_{1:K} \sim \textrm{Dirichlet}(a_{1:K}) \quad z\sim \textrm{Cat}(\rho_{1:K})

\rho_{1:K}|z \stackrel{d}{=} \textrm{Dirichlet}(a_{1:K}) \quad a'_k= a_k + \mathbb{1}_{\{z=k\}}

  • what happen when a_1 > a_2?