Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: Techniques and Models
Oren Bochman
Gibbs Sampling, Homework
Section omitted to comply with the Honor Code
---
title: "Homework on the Gibbs-Sampling algorithm - M2L5HW1"
subtitle: "Bayesian Statistics: Techniques and Models"
categories:
- Bayesian Statistics
keywords:
- Gibbs Sampling
- Homework
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
::: {#exr-mcmc-diags-1}
[convergence]{.column-margin} Why is it important to check your MCMC output for convergence before using the samples for inference?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] Convergence diagnostics provide a guarantee that your inferences are accurate.
- [ ] You can cut your Monte Carlo error by a factor of two if you strategically select which samples to retain.
- [x] If the chain has not reached its stationary distribution (the target/posterior), your samples will not reflect that distribution.
- [ ] Pre-convergence MCMC samples are useless.
MCMC is based on a process guaranteeing convergence to a stationary distribution.
:::
::: {#exr-mcmc-diags-2}
[convergence]{.column-margin} Which of the following trace plots illustrates a chain that appears to have converged?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] 
- [ ] 
- [ ] 
- [x] 
A seems to switch between a -4 and 0 value states. B is clearly still moving down. C has long term dependence (auto-correlated) and is divergent. D looks like IID samples at $-4.0 \pm 1$
:::
::: {#exr-mcmc-diags-3}
[convergence]{.column-margin} The trace plot below was generated by a *random walk Metropolis sampler*, where candidates were drawn from a *normal proposal distribution* with mean equal to the previous iteration's value, and a fixed variance. Based on this result, what action would you recommend taking next?

:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] The step size of the proposals is too small. Decrease the variance of the normal proposal distribution and re-run the chain.
- [x] The step size of the proposals is too small. Increase the variance of the normal proposal distribution and re-run the chain.
- [ ] The step size of the proposals is too large. Increase the variance of the normal proposal distribution and re-run the chain.
- [ ] The step size of the proposals is too large. Decrease the variance of the normal proposal distribution and re-run the chain.
**Step size** for a parameter is not an implicit component of RW-MH alg we leaned so the question and answers are well posed. Also we are not told that this is the trace plot is for the mean, but It seems a safe assumption. Since the M-H algorithm uses the current and previous mean to accept/reject we can consider their difference as a step size. In the long run each step contributes little when updating the mean. What has greater influence on the step size then is the constant variance.
We did discuss this type of trace as having long term interdependence (clumping). It seems that we want to increase the step size by increasing the variance to make new samples less dependent on the previous value of the mean.
In other words, it takes too long for the chain to explore the posterior distribution. This is less of a problem if you run a very long chain, but it is best to use a more efficient proposal distribution if possible.
:::
::: {#exr-mcmc-diags-4}
[convergence]{.column-margin} Suppose you have multiple MCMC chains from multiple initial values and they appear to traverse the same general area back and forth, but struggle from moderate (or high) autocorrelation. Suppose also that adjusting the proposal distribution *q* is not an option. Which of the following strategies is likely to help increase confidence in your Monte Carlo estimates?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] Discard fewer burn-in samples to increase your Monte Carlo effective sample size.
- [ ] Retain only the 80% of samples closest to the maximum likelihood estimate.
- [x] Run the chains for *many* more iterations and check for convergence on the larger time scale.
- [ ] Add more chains from more initial values to see if that reduces autocorrelation.
Proper MCMC algorithms come with a theoretical guarantee of eventual convergence to the target distribution. Chains with very high autocorrelation may require an impractical number of iterations, but it is worth checking to see if a longer chain yields acceptable results.
:::
::: {#exr-mcmc-diags-5}
[convergence]{.column-margin} Each of the following plots reports estimated autocorrelation from a MCMC chain with 10,000 iterations. Which will yield the lowest Monte Carlo effective sample size?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [x] 
- [ ] 
- [ ] 
- [x] 
MCMC algs are based on a process guaranteeing convergence to a stationary distribution.
:::
::: {#exr-mcmc-diags-6}
[convergence]{.column-margin}
The following trace plot shows four chains with distinct initial values. Of the choices given, what is the lowest number of samples you would comfortably recommend to discard as burn-in?

:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] A: 50 iterations.
- [ ] B: 150 iterations.
- [x] C: 400 iterations.
- [ ] D: 700 iterations.
MCMC algorithms are based on a process guaranteeing convergence to a stationary distribution.
:::
::: {#exr-mcmc-diags-7}
[convergence]{.column-margin}
Suppose the Gelman and Rubin diagnostic computed from multiple chains reports a scale reduction factor much higher than 1.0, say 8.0. What is the recommended action?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] Thin the chain by discarding every eighth sample.
- [x] Continue running the chain for *many* more iterations.
- [ ] Discontinue use of the model, since there is little hope of reaching the stationary distribution.
- [ ] Use the samples for inference as this high scale reduction factor indicates convergence.
MCMC algorithms are based on a process guaranteeing convergence to a stationary distribution.
:::
::: {#exr-mcmc-diags-8}
[convergence]{.column-margin} Which of the following Monte Carlo statistics would require the largest MCMC effective sample size to estimate reliably? Assume the target distribution is unimodal (has only one peak).
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [x] 97.5 percentile of the target distribution.
- [ ] Median of the target distribution.
- [ ] Mean of the target distribution.
- [ ] 15 percentile of the target distribution.
The outer edges of the distribution are sampled less frequently and therefore susceptible to changes between simulations. The **Raftery and Lewis** diagnostic can help you decide how many iterations you need to reliably estimate outer quantiles of the target distribution..
:::
:::::