34 Homework on the Metropolis-Hastings algorithm - M2L4HW1

Caution

Section omitted to comply with the Honor Code

--- title: "Homework on the Metropolis-Hastings algorithm - M2L4HW1" subtitle: "Bayesian Statistics: Techniques and Models" categories: - Bayesian Statistics keywords: - Gibbs Sampling - Homework --- ::::: {.content-visible unless-profile="HC"} ::: {.callout-caution} Section omitted to comply with the Honor Code ::: ::::: ::::: {.content-hidden unless-profile="HC"} ::: {#exr-mh-1} [M-H]{.column-margin} In which situation would we choose to use a Metropolis-Hastings (or any MCMC) sampler rather than straightforward Monte Carlo sampling? ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [ ] Monte Carlo estimation is easier than calculating the integral required to obtain the mean of the target distribution. - [ ] The target distribution follows a Markov chain. - [ ] The data (likelihood) come from a Markov chain. - [x] There is no easy way to simulate independent draws from the target distribution. ::: ::: {#exr-mh-2} Which of the following candidate-generating distributions would be best for an independent Metropolis-Hastings algorithm to sample the target distribution whose PDF is shown below? **Note**: In *independent Metropolis-Hastings*, the *candidate-generating distribution* $q$ does not depend on the previous iteration of the chain. ![](images/JH3ZR-l9EeafRBIyl6pTvg_555c78e85db4b5b5518a488ab1fbab19_ex2_g.svg) ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [ ] ![q=Unif(0.05,25.0)](images/3Gelw-l-EeaU0gpeZEyzfg_cebcd2f229626fc2d7052718952c1a4c_ex2_c.svg) - [x] ![q=Gamma(3.0,0.27)](images/MPomYel-EeaNiQ60ANFaMA_77099e243d48bc0552a942bab0dea8fc_ex2_a.svg) - [ ] ![q=Exp(r(0.1)](images/AEEy5Ol_EeaU0gpeZEyzfg_211f991f99097baa24920c549400eb1a_ex2_d.svg) - [ ] ![q=N(15,3.12)](images/lX_yKOl-EeafRBIyl6pTvg_83b13afc203f4a2ec19308f2945fbfb2_ex2_b.svg) Recall, in *independent M-H*, the candidate-generating distribution $q()$ should be similar to target distribution *g()*. ::: ::: {#exr-mh-3} [M-H]{.column-margin} If we employed an independent Metropolis-Hastings algorithm (in which the candidate-generating distribution q does not depend on the previous iteration of the chain), what would happen if we skipped the acceptance ratio step and always accepted candidate draws? ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [x] The resulting sample would be a Monte Carlo simulation from q instead of from the target distribution. - [ ] Each draw could be considered as a sample from the target distribution. - [ ] The chain would explore the posterior distribution very slowly, requiring more samples. - [ ] The sampler would become more efficient because we are no longer discarding draws. Accepting all candidates just means we are simulating from the candidate-generating distribution. The acceptance step in the algorithm acts as a correction, so that the samples reflect the target distribution more than the candidate-generating distribution. ::: ::: {#exr-mh-4} [M-H]{.column-margin} If the target distribution $\mathbb{P}r(\theta)∝g(\theta)$ is for a positive-valued random variable so that $\mathbb{P}r(\theta)$ contains the indicator function $I_{\theta>0}(\theta)$, what would happen if a random walk Metropolis sampler proposed the candidate $\theta∗=−0.3$? ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [ ] The candidate would be accepted with probability 0.3 because $g(\theta^*) = \lvert \theta^* \rvert$ , yielding an acceptance ratio $\alpha=0.3$. - [ ] The candidate would be accepted with probability 1 because $g(\theta^∗)=0$, yielding an acceptance ratio $\alpha=1$. - [ ] The candidate would be accepted with probability 1 because $g(\theta^∗)=0$, yielding an acceptance ratio $\alpha=\infty$. - [x] The candidate would be rejected with probability 1 because $g(\theta^∗)=0$, yielding an acceptance ratio $\alpha=0$. This strategy usually works, but sometimes runs into problems. Another solution is to draw candidates for the logarithm of $\theta$ (which of course has a different target distribution that you must derive) using Normal proposals ::: ::: {#exr-mh-5} [M-H]{.column-margin} Suppose we use a random walk Metropolis sampler with normal proposals (centered on the current value of the chain) to sample from the target distribution whose PDF is shown below. The chain is currently at $\theta_i=15.0$. Which of the other points, if used as a candidate $\theta^∗$ for the next step, would yield the largest acceptance ratio $\alpha$? ![c2l05-ex5-01](images/yRUGf-mBEeafRBIyl6pTvg_398dc6116d80cd878d34435dd09414c0_ex5_g.svg) ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [ ] A $\theta^∗=3.1$ - [x] B $\theta^∗=9.8$ - [ ] C $\theta^∗=20.3$ - [ ] D $\theta^∗=26.1$ B is the only point with a target density value (close to 0.09) higher than that of \thetai (close to 0.04). Since this is a random walk Metropolis sampler with symmetric proposal distribution, the expression for calculating the acceptance ratio for iteration i+1 is $\alpha=g(\theta^∗)/g(\theta_{i})$. In this case $\alpha$ would be close to 2, whereas for A, C, and D, we have $\alpha<1$. If point B were proposed, it would be accepted in this case. ::: ::: {#exr-mh-6} [M-H]{.column-margin} Suppose you are using a *random walk Metropolis* sampler with *Normal* proposals. After sampling the chain for $1000$ iterations, you notice that the *acceptance rate* for the candidate draws is only $0.02$. Which corrective action is most likely to help you approach a better acceptance rate (between $0.23$ and $0.50$)? ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [x] Decrease the variance of the normal proposal distribution $q$. - [ ] Increase the variance of the normal proposal distribution $q$. - [ ] Replace the normal proposal distribution with a uniform proposal distribution centered on the previous value and variance equal to that of the old normal proposal distribution. - [ ] Fix the mean of the normal proposal distribution at the last accepted candidate's value. Use the new mean for all future proposals. A low acceptance rate in a random walk Metropolis sampler usually indicates that the candidate-generating distribution is too wide and is proposing draws too far away from most of the target mass. Increasing the variance would likely make the problem worse, so we should decrease the variance. ::: ::: {#exr-mh-7} [M-H]{.column-margin} Suppose we use a random walk Metropolis sampler to sample from the target distribution $\mathbb{P}r(\theta)\propto g(\theta)$ and propose candidates $\theta^∗$ using the $\textrm{Unif}(\theta_{i−1}−\varepsilon,\theta_{i−1}+\varepsilon)$ distribution where $\varepsilon$ is some positive number and $\theta_{i−1}$ is the previous iteration's value of the chain. What is the correct expression for calculating the acceptance ratio $\alpha$ in this scenario? Hint: Notice that the $\text{Unif}(\theta_{i−1}−\varepsilon,\theta_{i−1}+\varepsilon)$ distribution is centered on the previous value and is symmetric (since the PDF is flat and extends the same distance $\varepsilon$ on either side). ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [ ] $\alpha = \frac{ \text{Unif}(\theta^* \mid \theta_{i-1} - \varepsilon, \, \theta_{i-1} + \varepsilon) }{ \text{Unif}(\theta_{i-1} \mid \theta^* - \varepsilon, \, \theta^* + \varepsilon) }$ Where $\text{Unif}(\theta | a,b)$ represents the PDF of a $\text{Unif}(a,b)$ evaluated at $\theta$. - [ ] $\alpha = \frac{ g(\theta_{i-1}) }{ g(\theta^*) }$ - [ ] $\alpha = \frac{ \text{Unif}(\theta_{i-1} \mid \theta^* - \varepsilon, \, \theta^* + \varepsilon) }{ \text{Unif}(\theta^* \mid \theta_{i-1} - \varepsilon, \, \theta_{i-1} + \varepsilon) }$ Where $\text{Unif}(\theta \mid a,b)$ represents the PDF of a $\text{Unif}(a,b)$ evaluated at $\theta$. - [x] $\alpha = \frac{ g(\theta^*) }{ g(\theta_{i-1}) }$ Since the proposal distribution is centered on the previous value and is symmetric, evaluations of $q$ drop from the calculation of $\alpha$. ::: ::: {#exr-mh-8} [M-H]{.column-margin} The following code completes one iteration of an algorithm to simulate a chain whose stationary distribution is $\mathbb{P}r(\theta) \propto g(\theta)$. Which algorithm is employed here? ```r # draw candidate theta_cand = rnorm(n=1, mean=0.0, sd=10.0) # evaluate log of g with the candidate lg_cand = lg(theta=theta_cand) # evaluate log of g at the current value lg_now = lg(theta=theta_now) # evaluate log of q at candidate lq_cand = dnorm(theta_cand, mean=0.0, sd=10.0, log=TRUE) # evaluate log of q at the current value lq_now = dnorm(theta_now, mean=0.0, sd=10.0, log=TRUE) # calculate the acceptance ratio lalpha = lg_cand + lq_now - lg_now - lq_cand alpha = exp(lalpha) # draw a uniform variable that will be less than alpha with probability min(1, alpha) u = runif(1) if (u < alpha) { # then accept the candidate theta_now = theta_cand accpt = accpt + 1 # to keep track of acceptance } ``` ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: - [ ] Random walk Metropolis with Normal proposal - [ ] Independent Metropolis-Hastings (q does not condition on the previous value of the chain) with Uniform proposal - [x] Independent Metropolis-Hastings (q does not condition on the previous value of the chain) with Normal proposal - [ ] Random walk Metropolis with Uniform proposal Notice that candidates are always drawn from the same $\mathcal{N}(0,102)$ distribution, which is not centered on the previous iteration. ::: :::::