77 Old Faithful eruptions density estimation with the MCMC algorithms - M4L7HW2

Caution

Section omitted to comply with the Honor Code

--- title : 'Old Faithful eruptions density estimation with the MCMC algorithms - M4L7HW2' subtitle : 'Bayesian Statistics: Mixture Models' categories: - Bayesian Statistics keywords: - Mixture Models - Homework - Density Estimation - MCMC algorithm - Location and scale of 2 Gaussians --- ::::: {.content-visible unless-profile="HC"} ::: {.callout-caution} Section omitted to comply with the Honor Code ::: ::::: ::::: {.content-hidden unless-profile="HC"} In this exercise, we implement the MCMC algorithm to fit a location-and-scale mixture of 2 univariate Gaussian distributions to the duration of eruptions of the Old Faithful geyser. We will then use the fitted model to generate a density estimate of the marginal distribution of the duration of the eruptions. ::: {.callout-note} ### Instructions - The R dataset faithful contains data on waiting time between eruptions (the column named waiting) and the duration of the eruption (the column named eruptions) for the famous Old Faithful geyser in Yellowstone National Park, Wyoming, USA. - In this case, we are asked to modify the MCMC algorithm provided in "Sample code for density estimation problems" (as opposed to the EM algorithm we used in the previous peer assignment) to provide a (Bayesian) density estimate the marginal distribution of the duration of the eruptions using a location-and-scale mixture of 2 univariate Gaussian distributions (as opposed to the location mixture of 6 univariate Gaussian distributions that we used for the galaxies dataset). Assume that the priors are $w \sim \mathrm{Beta}(1,1)$, $\mu_k \sim \mathcal{N}(\eta,\tau^2)$ and $1/σ^2_k \sim \mathrm{Gamma}(d,q)$, where $\eta$ is the mean of the data, $\tau^2$ is the variance of the data, $d$ is the degrees of freedom and $q$ is the scale parameter. The hyperparameters $\eta$, $\tau^2$, $d$ and $q$ are selected using an empirical Bayes approach similar to the one we used in "Sample code for density estimation problems". ::: ::: {.callout-note} ### Grading overview Reviewers will check whether the code has been modified correctly, and whether the density estimate you generate appears correct. Please remember that you are being asked to use a **location-and-scale mixture** to generate the density estimate, so the "Sample code for density estimation problems" cannot be used directly and requires some modification. Before submitting your answer, it might be useful to compare the density estimate generated by your algorithm against a kernel density estimate generated by the R function density(). While they should not be identical, they should be similar. ::: the full conditional for the $c_i$ scale-location mixture of 2 Gaussians is given by: $$ Pr(c_i=1 \mid x_i, \theta) = \frac{w_1 \cdot \phi(x_i \mid \mu_1, \sigma^2_1)}{w_1 \cdot \phi(x_i \mid \mu_1, \sigma^2_1) + w_2 \cdot \phi(x_i \mid \mu_2, \sigma^2_2)} $$ {#eq-complete-conditional-c} where is the density of the normal distribution with mean $\mu_k$ and variance $\sigma^2_k$, given by: $$ \phi(x_i \mid \mu_k, \sigma^2_k) = \frac{1}{\sqrt{2\pi\sigma^2_k}} \cdot \exp\left(-\frac{(x_i - \mu_k)^2}{2\sigma^2_k}\right) $$ {#eq-complete-conditional-phi} The full conditional of $\omega$ is given by: $$ \omega \mid c, x, = \mathrm{Beta}(a_1 + \sum_{i=1}^n \mathbb{1}_{(c_i=1)},\ a_2 + \sum_{i=1}^n \mathbb{1}_{(c_i=2)}) $$ {#eq-complete-conditional-omega} where $a_1$ and $a_2$ are the parameters of the Beta distribution, and $\mathbb{1}_{(c_i=k)}$ is an indicator function that is equal to 1 if $c_i=k$ and 0 otherwise. The full conditional of $\mu_k$ is given by: $$ \mu_k \mid c, x, \sigma^2_k = \mbox{Normal}\left( \left[ \frac{n_k}{\sigma_k^2} + \frac{1}{\tau^2} \right]^{-1}\left[ \frac{\sum_{i:c_i=k} x_i}{\sigma_k^2} + \frac{\eta}{\tau^2} \right], \left[ \frac{n_k}{\sigma_k^2} + \frac{1}{\tau^2} \right]^{-1}\right) $$ {#eq-complete-conditional-mu} where $n_k$ is the number of observations assigned to component $k$, $\eta$ is the mean of the data, and $\tau^2$ is the variance of the data. The full conditional of $\sigma^2_k$ is given by: $$ 1/\sigma^2_k \mid c, x, \mu_k = \mbox{Gamma}\left( d + \frac{n_k}{2} , q + \frac{1}{2} \sum_{i:c_k=k} (x_i - \mu_k)^2 \right) $$ {#eq-complete-conditional-sigma} where $d$ is the degrees of freedom and $q$ is the scale parameter. changes: 1. [x] the number of components from 6 to 2 2. [x] the data from galaxies to faithful$eruptions 3. [x] the prior for the weights from $w \sim Beta(1,1)$ to $w \sim Beta(1,1)$ 4. [x] the prior for the means from $\mu_k \sim Normal(\eta,\tau^2)$ to $\mu_k \sim Normal(\eta,\tau^2)$ 5. [x] the prior for the variances from $1/σ^2_k \sim Gamma(d,q)$ to $1/σ^2_k \sim Gamma(d,q)$ 6. [x] the full conditional for the $c_i$ from $c_i \sim \mathcal{N}(\mu_k,σ^2_k)$ to $c_i \sim \mathcal{N}(\mu_k,σ^2_k)$ - [only this one changes]{.marked} 7. [x] the hyperparameters from $\eta$, $\tau^2$, $d$ and $q$ to $\eta$, $\tau^2$, $d$ and $q$ 8. [x] The complete conditional as outlined above in [@eq-complete-conditional-c], [@eq-complete-conditional-phi], [@eq-complete-conditional-omega], [@eq-complete-conditional-mu] and [@eq-complete-conditional-sigma]. 9. [x] use the empirical Bayes approach to select the hyperparameters $\eta$, $\tau^2$, $d$ and $q$ - this is more or less what we have been doing all along, but now we are justifying it with the empirical Bayes approach. ```{r} #| label: lbl-location-scale-mixture ###### Setup data rm(list=ls()) set.seed(81196) # So that results are reproducible library(MASS) library(MCMCpack) data(faithful) #KK = 6 # Based on the description of the dataset KK = 2 # Number of components #x = galaxies x = faithful$eruptions #n = length(x) n = length(x) set.seed(781209) ## Empirical Bayes approach to select the hyperparameters eta = mean(x) # Mean of the data tau2 = var(x) # Variance of the data dd = 2 # Degrees of freedom for the Gamma distribution qq = var(x)/KK # Scale parameter for the Gamma distribution # Priors aa = rep(1,KK) # Initial values w = rep(1,KK)/KK # Use equal weight for each component mu = rnorm(KK, mean(x), sd(x)) # Random cluster centers sigma2 = rep(var(x),KK) # Random cluster scales sigma = sqrt(sigma2) #d = 2 # Degrees of freedom for the Gamma distribution #q = 1 # Scale parameter for the Gamma distribution cc = sample(1:KK, n, replace=TRUE,prob=w) # Random initial cluster assignments # MCMC settings iterations = 12000 burn = 2000 # Storage cc.out = array(0, dim=c(iterations, n)) w.out = array(0, dim=c(iterations, KK)) mu.out = array(0, dim=c(iterations, KK)) sigma.out = array(0, dim=c(iterations, KK)) for(s in 1:iterations){ # Sample latent indicators for(i in 1:n){ # This form is easier to extend to n components v = sapply(1:KK, function(k) log(w[k]) + dnorm(x[i], mu[k], sqrt(sigma2[k]), log=TRUE)) # than # v = rep(0,2) #v[1] = log(w) + dnorm(x[i], mu[1], sigma[1], log=TRUE) #Compute the log of the weights #v[2] = log(1-w) + dnorm(x[i], mu[2], sigma[2], log=TRUE) #Compute the log of the weights v = exp(v - max(v)) # stabilize the computation v = v / sum(v) # Normalize #cc[i] = sample(1:KK, size=1, prob=v) cc[i] = sample(1:KK, size=1, replace = TRUE, prob=v) } # Sample weights #w = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2)) # this is equivalent to the more general w = as.vector(rdirichlet(1, aa + tabulate(cc, nbins=KK))) # and both give a uniform distribution for the weights # Sample means and variances for(k in 1:KK){ nk = sum(cc==k) xk = x[cc==k] # Mean tau2_hat = 1/(nk/sigma2[k] + 1/tau2) mu_hat = tau2_hat * (sum(xk)/sigma2[k] + eta/tau2) mu[k] = rnorm(1, mu_hat, sqrt(tau2_hat)) # Variance dd_star = dd + nk/2 qq_star = qq + sum((xk - mu[k])^2)/2 sigma2[k] = 1/rgamma(1, dd_star, qq_star) } sigma = sqrt(sigma2) # Store samples cc.out[s,] = cc w.out[s,] = w mu.out[s,] = mu sigma.out[s,] = sigma if(s %% 1000 == 0) cat("Iteration", s, "\n") } ``` Provide code to generate the density estimate on a grid consisting of 150 points in the interval ```{r} #| label: lbl-posterior-density-estimation xx = seq(min(x), max(x), length=150) nxx = length(xx) density.mcmc = array(0, dim=c(iterations-burn, nxx)) for(s in 1:(iterations-burn)){ for(k in 1:KK){ density.mcmc[s,] = density.mcmc[s,] + w.out[s+burn, k] * dnorm(xx, mu.out[s+burn, k], sigma.out[s+burn, k]) } } density.mcmc.m = colMeans(density.mcmc) density.mcmc.lq = apply(density.mcmc, 2, quantile, 0.025) density.mcmc.uq = apply(density.mcmc, 2, quantile, 0.975) # Kernel density estimate yy = density(x) ``` Provide the a graph of the density estimate on the interval [0,7] ```{r} #| label: fig-visualization-of-density-estimate #| fig-cap: "Density estimate of the eruption duration using MCMC algorithm" plot(xx, density.mcmc.m, type="n", ylim=c(0, max(density.mcmc.uq)), xlim=c(0, 7), xlab="Eruption duration (min)", ylab="Density") polygon(c(xx, rev(xx)), c(density.mcmc.lq, rev(density.mcmc.uq)), col="grey", border=NA) lines(xx, density.mcmc.m, col="black", lwd=2) lines(yy, col="red", lwd=2, lty=2) points(x, rep(0, n), pch=16, cex=0.5) legend("topright", c("MCMC mean", "95% band", "KDE"), col=c("black", "grey", "red"), lty=c(1, NA, 2), lwd=2, pch=c(NA, 15, NA), pt.cex=2, bty="n") ``` ::: {.callout-note collapse="true"} ## solution code from the course ```{r} #| label: lbl-location-scale-mixture-solution x = faithful$eruptions n = length(x) plot(density(x)) points(x,rep(0,n)) ## Priors set up using an "empirical Bayes" approach aa = rep(1,2) eta = mean(x) tau = sqrt(var(x)) dd = 2 qq = var(x)/2 ## Initialize the parameters w = 1/2 mu = rnorm(2, mean(x), sd(x)) sigma = rep(sd(x)/2,2) cc = sample(1:2, n, replace=T, prob=c(1/2,1/2)) ## Number of iterations of the sampler iterations = 12000 burn = 2000 ## Storing the samples cc.out = array(0, dim=c(iterations, n)) w.out = rep(0, iterations) mu.out = array(0, dim=c(iterations, 2)) sigma.out = array(0, dim=c(iterations, 2)) logpost = rep(0, iterations) for(s in 1:iterations){ # Sample the indicators for(i in 1:n){ v = rep(0,2) v[1] = log(w) + dnorm(x[i], mu[1], sigma[1], log=TRUE) #Compute the log of the weights v[2] = log(1-w) + dnorm(x[i], mu[2], sigma[2], log=TRUE) #Compute the log of the weights v = exp(v - max(v))/sum(exp(v - max(v))) cc[i] = sample(1:2, 1, replace=TRUE, prob=v) } # Sample the weights w = rbeta(1, aa[1] + sum(cc==1), aa[2] + sum(cc==2)) # Sample the parameters of the components for(k in 1:2){ # Sample the means nk = sum(cc==k) # how many points assigned to component k xsumk = sum(x[cc==k]) # sum of the points assigned to component k tau2.hat = 1/(nk/sigma[k]^2 + 1/tau^2) # tau star squared mu.hat = tau2.hat*(xsumk/sigma[k]^2 + eta/tau^2) mu[k] = rnorm(1, mu.hat, sqrt(tau2.hat)) # Sample the variances dd.star = dd + nk/2 qq.star = qq + sum((x[cc==k] - mu[k])^2)/2 sigma[k] = sqrt(1/rgamma(1, dd.star, qq.star)) } # Store samples cc.out[s,] = cc w.out[s] = w mu.out[s,] = mu sigma.out[s,] = sigma logpost[s] = 0 ## Compute the log-posterior for(i in 1:n){ if(cc[i] == 1){ logpost[s] = logpost[s] + log(w) + dnorm(x[i], mu[1], sigma[1], log=TRUE) }else{ logpost[s] = logpost[s] + log(1-w) + dnorm(x[i], mu[2], sigma[2], log=TRUE) } } logpost[s] = logpost[s] + dbeta(w, aa[1], aa[2], log=TRUE) for(k in 1:2){ logpost[s] = logpost[s] + dnorm(mu[k], eta, tau, log = T) + dgamma(1/sigma[k]^2, dd, qq)/sigma[k]^4 } if(s/500==floor(s/500)){ print(paste("s =",s)) } } ## Compute the samples of the density over a dense grid xx = seq(0,7,length=150) density.mcmc = array(0, dim=c(iterations-burn,length(xx))) for(s in 1:(iterations-burn)){ density.mcmc[s,] = density.mcmc[s,] + w.out[s+burn]*dnorm(xx,mu.out[s+burn,1],sigma.out[s+burn,1]) + (1-w.out[s+burn])*dnorm(xx,mu.out[s+burn,2],sigma.out[s+burn,2]) } density.mcmc.m = apply(density.mcmc , 2, mean) ## Plot Bayesian estimate with pointwise credible bands density.mcmc.lq = apply(density.mcmc, 2, quantile, 0.025) density.mcmc.uq = apply(density.mcmc, 2, quantile, 0.975) plot(xx, density.mcmc.m, type="n",ylim=c(0,max(density.mcmc.uq)),xlab="Eruptions", ylab="Density") polygon(c(xx,rev(xx)), c(density.mcmc.lq, rev(density.mcmc.uq)), col="grey", border="grey") lines(xx, density.mcmc.m, col="black", lwd=2) points(x, rep(0,n)) ``` ::: :::::