Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: Mixture Models
Oren Bochman
Mixture Models, Homework, Density Estimation, MCMC algorithm, Location and scale of 2 Gaussians
Section omitted to comply with the Honor Code
---
title : 'Old Faithful eruptions density estimation with the MCMC algorithms - M4L7HW2'
subtitle : 'Bayesian Statistics: Mixture Models'
categories:
- Bayesian Statistics
keywords:
- Mixture Models
- Homework
- Density Estimation
- MCMC algorithm
- Location and scale of 2 Gaussians
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
In this exercise, we implement the MCMC algorithm to fit a location-and-scale mixture of 2 univariate Gaussian distributions to the duration of eruptions of the Old Faithful geyser. We will then use the fitted model to generate a density estimate of the marginal distribution of the duration of the eruptions.
::: {.callout-note}
### Instructions
- The R dataset faithful contains data on waiting time between eruptions (the column named waiting) and the duration of the eruption (the column named eruptions) for the famous Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
- In this case, we are asked to modify the MCMC algorithm provided in "Sample code for density estimation problems" (as opposed to the EM algorithm we used in the previous peer assignment) to provide a (Bayesian) density estimate the marginal distribution of the duration of the eruptions using a location-and-scale mixture of 2 univariate Gaussian distributions (as opposed to the location mixture of 6 univariate Gaussian distributions that we used for the galaxies dataset).
Assume that the priors are $w \sim \mathrm{Beta}(1,1)$, $\mu_k \sim \mathcal{N}(\eta,\tau^2)$ and $1/σ^2_k \sim \mathrm{Gamma}(d,q)$, where $\eta$ is the mean of the data, $\tau^2$ is the variance of the data, $d$ is the degrees of freedom and $q$ is the scale parameter.
The hyperparameters $\eta$, $\tau^2$, $d$ and $q$ are selected using an empirical Bayes approach similar to the one we used in "Sample code for density estimation problems".
:::
::: {.callout-note}
### Grading overview
Reviewers will check whether the code has been modified correctly, and whether the density estimate you generate appears correct. Please remember that you are being asked to use a **location-and-scale mixture** to generate the density estimate, so the "Sample code for density estimation problems" cannot be used directly and requires some modification. Before submitting your answer, it might be useful to compare the density estimate generated by your algorithm against a kernel density estimate generated by the R function density(). While they should not be identical, they should be similar.
:::
the full conditional for the $c_i$ scale-location mixture of 2 Gaussians is given by:
$$
Pr(c_i=1 \mid x_i, \theta) = \frac{w_1 \cdot \phi(x_i \mid \mu_1, \sigma^2_1)}{w_1 \cdot \phi(x_i \mid \mu_1, \sigma^2_1) + w_2 \cdot \phi(x_i \mid \mu_2, \sigma^2_2)}
$$ {#eq-complete-conditional-c}
where is the density of the normal distribution with mean $\mu_k$ and variance $\sigma^2_k$, given by:
$$
\phi(x_i \mid \mu_k, \sigma^2_k) = \frac{1}{\sqrt{2\pi\sigma^2_k}} \cdot \exp\left(-\frac{(x_i - \mu_k)^2}{2\sigma^2_k}\right)
$$ {#eq-complete-conditional-phi}
The full conditional of $\omega$ is given by:
$$
\omega \mid c, x, = \mathrm{Beta}(a_1 + \sum_{i=1}^n \mathbb{1}_{(c_i=1)},\ a_2 + \sum_{i=1}^n \mathbb{1}_{(c_i=2)})
$$ {#eq-complete-conditional-omega}
where $a_1$ and $a_2$ are the parameters of the Beta distribution, and $\mathbb{1}_{(c_i=k)}$ is an indicator function that is equal to 1 if $c_i=k$ and 0 otherwise.
The full conditional of $\mu_k$ is given by:
$$
\mu_k \mid c, x, \sigma^2_k = \mbox{Normal}\left( \left[ \frac{n_k}{\sigma_k^2} + \frac{1}{\tau^2} \right]^{-1}\left[ \frac{\sum_{i:c_i=k} x_i}{\sigma_k^2} + \frac{\eta}{\tau^2} \right], \left[ \frac{n_k}{\sigma_k^2} + \frac{1}{\tau^2} \right]^{-1}\right)
$$ {#eq-complete-conditional-mu}
where $n_k$ is the number of observations assigned to component $k$, $\eta$ is the mean of the data, and $\tau^2$ is the variance of the data.
The full conditional of $\sigma^2_k$ is given by:
$$
1/\sigma^2_k \mid c, x, \mu_k = \mbox{Gamma}\left( d + \frac{n_k}{2} , q + \frac{1}{2} \sum_{i:c_k=k} (x_i - \mu_k)^2 \right)
$$ {#eq-complete-conditional-sigma}
where $d$ is the degrees of freedom and $q$ is the scale parameter.
changes:
1. [x] the number of components from 6 to 2
2. [x] the data from galaxies to faithful$eruptions
3. [x] the prior for the weights from $w \sim Beta(1,1)$ to $w \sim Beta(1,1)$
4. [x] the prior for the means from $\mu_k \sim Normal(\eta,\tau^2)$ to $\mu_k \sim Normal(\eta,\tau^2)$
5. [x] the prior for the variances from $1/σ^2_k \sim Gamma(d,q)$ to $1/σ^2_k \sim Gamma(d,q)$
6. [x] the full conditional for the $c_i$ from $c_i \sim \mathcal{N}(\mu_k,σ^2_k)$ to $c_i \sim \mathcal{N}(\mu_k,σ^2_k)$ - [only this one changes]{.marked}
7. [x] the hyperparameters from $\eta$, $\tau^2$, $d$ and $q$ to $\eta$, $\tau^2$, $d$ and $q$
8. [x] The complete conditional as outlined above in [@eq-complete-conditional-c], [@eq-complete-conditional-phi], [@eq-complete-conditional-omega], [@eq-complete-conditional-mu] and [@eq-complete-conditional-sigma].
9. [x] use the empirical Bayes approach to select the hyperparameters $\eta$, $\tau^2$, $d$ and $q$ - this is more or less what we have been doing all along, but now we are justifying it with the empirical Bayes approach.
```{r}
#| label: lbl-location-scale-mixture
###### Setup data
rm(list=ls())
set.seed(81196) # So that results are reproducible
library(MASS)
library(MCMCpack)
data(faithful)
#KK = 6 # Based on the description of the dataset
KK = 2 # Number of components
#x = galaxies
x = faithful$eruptions
#n = length(x)
n = length(x)
set.seed(781209)
## Empirical Bayes approach to select the hyperparameters
eta = mean(x) # Mean of the data
tau2 = var(x) # Variance of the data
dd = 2 # Degrees of freedom for the Gamma distribution
qq = var(x)/KK # Scale parameter for the Gamma distribution
# Priors
aa = rep(1,KK)
# Initial values
w = rep(1,KK)/KK # Use equal weight for each component
mu = rnorm(KK, mean(x), sd(x)) # Random cluster centers
sigma2 = rep(var(x),KK) # Random cluster scales
sigma = sqrt(sigma2)
#d = 2 # Degrees of freedom for the Gamma distribution
#q = 1 # Scale parameter for the Gamma distribution
cc = sample(1:KK, n, replace=TRUE,prob=w) # Random initial cluster assignments
# MCMC settings
iterations = 12000
burn = 2000
# Storage
cc.out = array(0, dim=c(iterations, n))
w.out = array(0, dim=c(iterations, KK))
mu.out = array(0, dim=c(iterations, KK))
sigma.out = array(0, dim=c(iterations, KK))
for(s in 1:iterations){
# Sample latent indicators
for(i in 1:n){
# This form is easier to extend to n components
v = sapply(1:KK, function(k) log(w[k]) + dnorm(x[i], mu[k], sqrt(sigma2[k]), log=TRUE))
# than
# v = rep(0,2)
#v[1] = log(w) + dnorm(x[i], mu[1], sigma[1], log=TRUE) #Compute the log of the weights
#v[2] = log(1-w) + dnorm(x[i], mu[2], sigma[2], log=TRUE) #Compute the log of the weights
v = exp(v - max(v)) # stabilize the computation
v = v / sum(v) # Normalize
#cc[i] = sample(1:KK, size=1, prob=v)
cc[i] = sample(1:KK, size=1, replace = TRUE, prob=v)
}
# Sample weights
#w = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2))
# this is equivalent to the more general
w = as.vector(rdirichlet(1, aa + tabulate(cc, nbins=KK)))
# and both give a uniform distribution for the weights
# Sample means and variances
for(k in 1:KK){
nk = sum(cc==k)
xk = x[cc==k]
# Mean
tau2_hat = 1/(nk/sigma2[k] + 1/tau2)
mu_hat = tau2_hat * (sum(xk)/sigma2[k] + eta/tau2)
mu[k] = rnorm(1, mu_hat, sqrt(tau2_hat))
# Variance
dd_star = dd + nk/2
qq_star = qq + sum((xk - mu[k])^2)/2
sigma2[k] = 1/rgamma(1, dd_star, qq_star)
}
sigma = sqrt(sigma2)
# Store samples
cc.out[s,] = cc
w.out[s,] = w
mu.out[s,] = mu
sigma.out[s,] = sigma
if(s %% 1000 == 0) cat("Iteration", s, "\n")
}
```
Provide code to generate the density estimate on a grid consisting of 150 points in the interval
```{r}
#| label: lbl-posterior-density-estimation
xx = seq(min(x), max(x), length=150)
nxx = length(xx)
density.mcmc = array(0, dim=c(iterations-burn, nxx))
for(s in 1:(iterations-burn)){
for(k in 1:KK){
density.mcmc[s,] = density.mcmc[s,] +
w.out[s+burn, k] * dnorm(xx, mu.out[s+burn, k], sigma.out[s+burn, k])
}
}
density.mcmc.m = colMeans(density.mcmc)
density.mcmc.lq = apply(density.mcmc, 2, quantile, 0.025)
density.mcmc.uq = apply(density.mcmc, 2, quantile, 0.975)
# Kernel density estimate
yy = density(x)
```
Provide the a graph of the density estimate on the interval [0,7]
```{r}
#| label: fig-visualization-of-density-estimate
#| fig-cap: "Density estimate of the eruption duration using MCMC algorithm"
plot(xx, density.mcmc.m, type="n", ylim=c(0, max(density.mcmc.uq)), xlim=c(0, 7),
xlab="Eruption duration (min)", ylab="Density")
polygon(c(xx, rev(xx)), c(density.mcmc.lq, rev(density.mcmc.uq)), col="grey", border=NA)
lines(xx, density.mcmc.m, col="black", lwd=2)
lines(yy, col="red", lwd=2, lty=2)
points(x, rep(0, n), pch=16, cex=0.5)
legend("topright", c("MCMC mean", "95% band", "KDE"),
col=c("black", "grey", "red"), lty=c(1, NA, 2), lwd=2, pch=c(NA, 15, NA), pt.cex=2, bty="n")
```
::: {.callout-note collapse="true"}
## solution code from the course
```{r}
#| label: lbl-location-scale-mixture-solution
x = faithful$eruptions
n = length(x)
plot(density(x))
points(x,rep(0,n))
## Priors set up using an "empirical Bayes" approach
aa = rep(1,2)
eta = mean(x)
tau = sqrt(var(x))
dd = 2
qq = var(x)/2
## Initialize the parameters
w = 1/2
mu = rnorm(2, mean(x), sd(x))
sigma = rep(sd(x)/2,2)
cc = sample(1:2, n, replace=T, prob=c(1/2,1/2))
## Number of iterations of the sampler
iterations = 12000
burn = 2000
## Storing the samples
cc.out = array(0, dim=c(iterations, n))
w.out = rep(0, iterations)
mu.out = array(0, dim=c(iterations, 2))
sigma.out = array(0, dim=c(iterations, 2))
logpost = rep(0, iterations)
for(s in 1:iterations){
# Sample the indicators
for(i in 1:n){
v = rep(0,2)
v[1] = log(w) + dnorm(x[i], mu[1], sigma[1], log=TRUE) #Compute the log of the weights
v[2] = log(1-w) + dnorm(x[i], mu[2], sigma[2], log=TRUE) #Compute the log of the weights
v = exp(v - max(v))/sum(exp(v - max(v)))
cc[i] = sample(1:2, 1, replace=TRUE, prob=v)
}
# Sample the weights
w = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2))
# Sample the parameters of the components
for(k in 1:2){
# Sample the means
nk = sum(cc==k) # how many points assigned to component k
xsumk = sum(x[cc==k]) # sum of the points assigned to component k
tau2.hat = 1/(nk/sigma[k]^2 + 1/tau^2) # tau star squared
mu.hat = tau2.hat*(xsumk/sigma[k]^2 + eta/tau^2)
mu[k] = rnorm(1, mu.hat, sqrt(tau2.hat))
# Sample the variances
dd.star = dd + nk/2
qq.star = qq + sum((x[cc==k] - mu[k])^2)/2
sigma[k] = sqrt(1/rgamma(1, dd.star, qq.star))
}
# Store samples
cc.out[s,] = cc
w.out[s] = w
mu.out[s,] = mu
sigma.out[s,] = sigma
logpost[s] = 0
## Compute the log-posterior
for(i in 1:n){
if(cc[i] == 1){
logpost[s] = logpost[s] + log(w) + dnorm(x[i], mu[1], sigma[1], log=TRUE)
}else{
logpost[s] = logpost[s] + log(1-w) + dnorm(x[i], mu[2], sigma[2], log=TRUE)
}
}
logpost[s] = logpost[s] + dbeta(w, aa[1], aa[2], log=TRUE)
for(k in 1:2){
logpost[s] = logpost[s] + dnorm(mu[k], eta, tau, log = T) + dgamma(1/sigma[k]^2, dd, qq)/sigma[k]^4
}
if(s/500==floor(s/500)){
print(paste("s =",s))
}
}
## Compute the samples of the density over a dense grid
xx = seq(0,7,length=150)
density.mcmc = array(0, dim=c(iterations-burn,length(xx)))
for(s in 1:(iterations-burn)){
density.mcmc[s,] = density.mcmc[s,] + w.out[s+burn]*dnorm(xx,mu.out[s+burn,1],sigma.out[s+burn,1]) +
(1-w.out[s+burn])*dnorm(xx,mu.out[s+burn,2],sigma.out[s+burn,2])
}
density.mcmc.m = apply(density.mcmc , 2, mean)
## Plot Bayesian estimate with pointwise credible bands
density.mcmc.lq = apply(density.mcmc, 2, quantile, 0.025)
density.mcmc.uq = apply(density.mcmc, 2, quantile, 0.975)
plot(xx, density.mcmc.m, type="n",ylim=c(0,max(density.mcmc.uq)),xlab="Eruptions", ylab="Density")
polygon(c(xx,rev(xx)), c(density.mcmc.lq, rev(density.mcmc.uq)), col="grey", border="grey")
lines(xx, density.mcmc.m, col="black", lwd=2)
points(x, rep(0,n))
```
:::
:::::