Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: Advanced BIC
Oren Bochman
Mixture Models, Honors Homework, Zero-Inflated Data, Zero-Inflated Poisson, MCMC, unit tests
Section omitted to comply with the Honor Code
---
title : 'Homework on BIC for zero-inflated mixtures - M5L09HW4'
subtitle : 'Bayesian Statistics: Advanced BIC'
categories:
- Bayesian Statistics
keywords:
- Mixture Models
- Honors Homework
- Zero-Inflated Data
- Zero-Inflated Poisson
- MCMC
- unit tests
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
## HW - Simulation of Poisson mixture model
In the lessons we mentioned that Zero-Inflated Poisson (ZIP) models arises naturally in biology when analyzing nest size data since many birds fail to mate or lay eggs. As such they will have zero eggs in their nests.
Poisson tends to underestimate the number of empty nests in the data, and overestimate the number of nests with either 1 or 2 eggs. Negative binomial can mitigate this problem by adding a tunable parameter to control for the dispersion of count data, however, it isn't necessarily a good fix for zero-inflated data.
In this exercise, we will use the EM algorithm to fit a ZIP mixture model to a dataset of nest sizes.
::: {.callout-note}
### Instructions
\index{dataset!nest size}
- In week 2, you considered the problem faced by a biologist is interest in characterizing the number of eggs laid by a particular bird species. The data consisted of a sample n=300 nests on a site in Southern California, which were contained in the file nestsize.csv:
`data/nestsize.csv`
At the time we visually compared the empirical distribution of the data against a Poison distribution whose parameter has been set to its maximum likelihood estimator as a justification for using a mixture model of the form
$$
f(x) = w \delta_0(x) + (1-w) \frac{e^{-\lambda} \lambda^x}{x!}, \quad x \in \{0, 1, 2, \ldots\}
$$ {#eq-zip-mixture}
where $w$ is the weight associated with the point mass at zero, $\lambda$ is the parameter of the Poisson distribution, and $\delta_0(x)$ represents the degenerate distribution placing all of its mass at zero.
- You are asked to build on the EM algorithm you constructed in week 2 to compute the BIC associated with this model and contrast it against the BIC for the simpler Poisson model.
:::
## Full conditional distributions {#sec-zip-mixture-fc-4}
The full conditional distributions for the indicators of the ZIP model are given by:
$$
\mathbb{P}r(c_i = 1 \mid \cdots) \propto
\begin{cases} w & x_i=0 \\
0 & \mbox{otherwise}
\end{cases}
$$ {#eq-zip-mixture-fc-c1}
$$
\mathbb{P}r(c_i = 2 \mid \cdots) \propto
\begin{cases} (1-w)
\frac{\lambda^{x_i} e^{-\lambda}}{x_i!} & x_i=0 \\
1 & \mbox{otherwise}
\end{cases}
$$ {#eq-zip-mixture-fc-c2}
where $c_i$ is the latent indicator for observation $i$, and $x_i$ is the observed data.
The full conditional distributions for the weights are given by:
$$
\omega \mid \cdots \sim \mbox{Beta}\left(m(\mathbf{c})+1, n-m(\mathbf{c})+1\right)
$$ {#eq-zip-mixture-fc-w}
where $m(\mathbf{c})$ is the number of observations with $c_i=1$.
Is the full conditional for the rate $\gamma$
$$
\lambda \mid \cdots \sim \mbox{Gamma}\left( 1 + \sum_{i : c_i = 2} x_i , 1 + n-m(\mathbf{c}) \right)
$$ {#eq-zip-mixture-fc-lambda}
where $m(\mathbf{c})$ is the number of observations with $c_i=2$.
## My Solution code
Derive the formula for the BIC associated with the the Poisson model
$$
f(x) = \frac{e^{-\lambda} \lambda^x}{x!}, \quad x \in \{0, 1, 2, \ldots\}
$$ {#eq-zip-mixture}
(Note that you can think about this model as a mixture with a single component).
Then, provide code to compute the BIC and use it to evaluate it for the dataset nestsize.csv:
```{r}
rm(list=ls())
set.seed(81196) # So that results are reproducible (same simulated data every time)
nestsize <- read.csv("data/nestsize.csv",header=FALSE)
colnames(nestsize) <- c("n")
x <- nestsize$n
n <- length(x) # Number of observations
lambdahat = mean(x)
n = length(x)
twicenegloglik = -2*sum(dpois(x,lambdahat,log=TRUE))
BIC1 = twicenegloglik + log(n)
BIC1 # 1272.175
```
$$
L(\lambda) = \prod_{i=0}^n \sum_{k=0}^K \frac{e^{-\lambda} \times \lambda^x}{x!}
$$
```{r}
## L(\lambda) = \prod_{i=0}^n \sum_{k=0}^K \frac{e^{-\lambda} \times \lambda^x}{x!}
## Setup controls for the algorithm
s = 0
sw = FALSE
QQ = -Inf
QQ.out = NULL
epsilon = 10^(-5)
## Initiatlize parameters
w = 0.1 #Assign equal weight to each component to start with
lambda = mean(x) #Random cluster centers randomly spread over the support of the data
## Repeat E and M steps until convergence
while(!sw){
## E step
v = array(0, dim=c(n,2))
for(i in 1:n){
if(x[i]==0){
v[i,1] = log(w)
v[i,2] = log(1-w) + dpois(x[i], lambda, log=TRUE)
v[i,] = exp(v[i,] - max(v[i,]))/sum(exp(v[i,] - max(v[i,])))
}else{
v[i,1] = 0
v[i,2] = 1
}
}
## M step
# Weights
w = mean(v[,1])
lambda = sum(x)/sum(v[,2])
##Check convergence
QQn = 0
for(i in 1:n){
if(x[i]==0){
QQn = QQn + v[i,1]*log(w) + v[i,2]*(log(1-w) + dpois(x[i], lambda, log=TRUE))
}else{
QQn = QQn + v[i,2]*(log(1-w) + dpois(x[i], lambda, log=TRUE))
}
}
if(abs(QQn-QQ)/abs(QQn)<epsilon){
sw=TRUE
}
QQ = QQn
QQ.out = c(QQ.out, QQ)
s = s + 1
print(paste(s, QQn))
}
print(paste("lambda: ",lambda)) #3.07
print(paste("w: ",w)) #.40
print(paste("1-w: ",1-w))
## Compute twice the negative log-likelihood for the model
twicenegloglik = 0
for(i in 1:n){
if(x[i]==0){
twicenegloglik = twicenegloglik - 2*log(w + (1-w)*dpois(x[i], lambda)) # why no log=True here
# dpois = e^(-lambda)* lambda for x =0
# log=True would give would be 0 -lambda * log(e) + log(lambda) = log(lambda) - lambda
}else{
twicenegloglik = twicenegloglik - 2*( log(1-w) + dpois(x[i], lambda, log=TRUE))
}
}
BIC2 = twicenegloglik + 2*log(n)
BIC2 # 1056.85
```
```{r}
#| label: lbl-test-posterior-means
#| error: true
library(testthat)
testthat::test_that("Posterior means are correct", {
expect_equal(round(mean(w),2), 0.40)
expect_equal(round(mean(lambda),2), 3.07)
expect_equal(round(mean(BIC1),2), 1272.18,tolerance = 0.01)
expect_equal(round(mean(BIC2),2), 1056.85,tolerance = 0.01)
expect_equal(round(mean(BIC1),3), 1272.175)
expect_equal(round(mean(BIC2),3), 1056.845)
})
```
:::::