Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: Techniques and Models
Oren Bochman
Conditional Probability, ANOVA, Homework
Section omitted to comply with the Honor Code
---
title: "Homework on Multiple Factor ANOVA - M3L8HW2"
subtitle: "Bayesian Statistics: Techniques and Models"
categories:
- Bayesian Statistics
keywords:
- Conditional Probability
- ANOVA
- Homework
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
\index{dataset!Anscombe}
[Laplace Prior]{.column-margin} For Questions 1 and 2, consider the Anscombe data from the car package in R which we analyzed in the quizzes for Lesson 7.
::: {#exr-MANOVA-1}
[Laplace Prior]{.column-margin} In the original model, we used normal priors for the three regression coefficients. Here we will consider using Laplace priors centered on 0. The parameterization used in JAGS for the Laplace (double exponential) distribution has an inverse scale parameter $\tau$. This is related to the variance $\nu$ of the prior in the following way: $\nu=2/\tau^2$. Suppose we want the Laplace prior to have variance $\nu=2$. What value of $\tau$ should we use in the JAGS code?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
1.0
Alternatively, we could place an informative prior on $\tau$ and learn it along with other parameters.
:::
::: {#exr-MANOVA-2}
[Laplace Prior]{.column-margin} When using an informative variable selection prior like the Laplace, we typically center and scale the data:
```{r}
#| label: setup
library("rjags")
library("car")
data("Anscombe")
head(Anscombe)
#?Anscombe
Xc = scale(Anscombe, center=TRUE, scale=TRUE)
str(Xc)
```
```{r}
#| label: anascome reference regression
data_jags = as.list(Anscombe)
mod_string = "model {
for (i in 1:length(education)) {
education[i] ~ dnorm(mu[i], prec)
mu[i] = b0 + b[1]*income[i] + b[2]*young[i] + b[3]*urban[i]
}
b0 ~ dnorm(0.0, 1.0/1.0e6)
for (j in 1:3) {
b[j] ~ dnorm(0.0, 1.0/1.0e6)
}
prec ~ dgamma(1.0/2.0, 1.0*1500.0/2.0)
sig2 = 1.0 / prec
sig = sqrt(sig2)
} "
params = c("b","sig")
set.seed(72)
mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)
update(mod, 1e4)
mod_sim = coda.samples(model=mod, variable.names=params, n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim))
dic = dic.samples(mod, n.iter=1e3)
par(mar = c(2.5, 1, 2.5, 1))
plot(mod_sim)
```
```{r}
#| label: anascome regression
data_jags = as.list(data.frame(Xc))
mod2_string = " model {
for (i in 1:length(education)) {
education[i] ~ dnorm(mu[i], prec)
mu[i] = b[1]*income[i] + b[2]*young[i] + b[3]*urban[i]
}
for (i in 1:3) {
b[i] ~ ddexp(0.0, 2)
}
prec ~ dgamma(0.5, 0.5)
sig2 = 1.0 / prec
sig = sqrt(sig2)
} "
params2 = c("b","sig")
mod2 = jags.model(textConnection(mod2_string), data=data_jags, n.chains=3)
update(mod2, 1e3)
mod2_sim = coda.samples(model=mod2, variable.names=params2, n.iter=5e3)
mod2_csim = as.mcmc(do.call(rbind, mod2_sim))
dic2 = dic.samples(mod2, n.iter=1e3)
par(mar = c(2.5, 1, 2.5, 1))
plot(mod2_sim)
```
Because we subtracted the mean from all (continuous) variables including the response, this is a rare case where we do not need an intercept. Fit the model in JAGS using the Laplace prior with variance 2 for each of the three coefficients, and an inverse gamma prior for the observation variance with effective sample size 1 and prior guess 1.
How do the inferences for the coefficients compare to the original model fit in the quiz for Lesson 7 (besides that their scale has changed due to scaling the data)?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{r}
#| label: C2-L08-Ex2-4
summary(mod_sim)
summary(mod2_sim)
```
- [ ] The inferences are similar, with one exception. The first two coefficients (for income and percentage youth) are significantly positive and the percent urban coefficient's posterior looks like the Laplace prior, with a spike near 0. This indicates that the percent urban "effect" is very weak.
- [ ] Inexplicably, the signs of all coefficients have changed (from positive to negative and from negative to positive).
- [x] The inferences are essentially unchanged. The first two coefficients (for income and percentage youth) are significantly positive and the percent urban coefficient is still negative.
- [ ] The inferences are vastly different. The marginal posterior for all three coefficients look like their Laplace priors, with a spike near 0. This indicates that the "effect" associated with each covariate is very weak.
Notes:
1. There is a big change in sigma, it is much smaller in the second model indicating that the posterior is much narrower.
2. Also we can see that the largest effect in the first model is b\[2\] and in the second model it is b\[1\] 3. the 95% intervals in the first model are \[0.06085,0.09832\], \[0.47245 1.09604\] and \[-0.17580,-0.04033\] for the second they are \[0.7001 ,1.15435\] \[0.2381 ,0.57309\] \[-0.5289 ,-0.07231\] in both models they do not intersect - which is strong evidence for different means per each group. 4. picked that answer the strength of the coefficients has changed the margnials plots of the second model do not seem to be zero though b\[3\] is fairly close to zero.
:::
::: {#exr-MANOVA-3}
[ANOVA]{.column-margin} Consider an ANOVA model for subjects' responses to three experimental factor variables related to a proposed health supplement: dose, frequency, and physical activity. Dose has two levels: 100mg and 200mg. Frequency has three levels: "daily," "twice-weekly," and "weekly." Physical activity has two levels: "low" and "high." If these are the only covariates available and we assume that responses are iid normally distributed, what is the maximum number of parameters we could potentially use to uniquely describe the mean response?
```{r}
#| label: C2-L08-Ex2-5
#| output: False
library("rjags")
```
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
12
There are 2×3×2 total treatment combinations. If not every treatment combination is observed, then a model with fewer parameters would be necessary.
:::
::: {#exr-MANOVA-4}
[ANOVA]{.column-margin} If we have both categorical and continuous covariates, then it is common to use the linear model parameterization instead of the cell means model. If it is unclear how to set it up, you can use the model.matrix function in R as we have in the lessons.
Suppose that in addition to the experimental factors in the previous question, we have two continuous covariates: weight in kg and resting heart rate in beats per minute. If we use 100mg dose, daily frequency, and low physical activity as the baseline group, which of the following gives the linear model parameterization for an additive model with no interactions?
:::
\index{dataset!ANOVA}
\index{model!baseline}
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] $\text{E}(y_i) = \beta_0 + \beta_1 I_{{\texttt dose}_i=100} + \beta_2 I_{{\texttt freq}_i={\texttt daily}} + \beta_3 I_{{\texttt phys}_i={\texttt low}} + \ldots + \beta_4 {\texttt weight}_i + \beta_5 {\texttt heart}_i$
- [ ] $\text{E}(y_i) = \mu_{g_i} + \beta_1 {\texttt weight}_i + \beta_2 {\texttt heart}_i$
- [ ] $\text{E}(y_i) = \mu_{g_i} + \beta_1 {\texttt weight}_i + \beta_2 {\texttt heart}_i$
- [x] $\text{E}(y_i) = \beta_0 + \beta_1 I_{{\texttt dose}_i=200} + \beta_2 I_{{\texttt freq}_i={\texttt twice\ weekly}} + \beta_3 I_{{\texttt freq}_i={\texttt weekly}} + \ldots + \beta_4 I_{{\texttt phys}_i={\texttt high}} + \beta_5 {\texttt weight}_i + \beta_6 {\texttt heart}_i$
:::
::: {#exr-MANOVA-5}
[ANOVA]{.column-margin} The reading in this honors section describes an analysis of the warp breaks data. Of the models fit, we concluded that the full cell means model was most appropriate. However, we did not assess whether constant observation variance across all groups was appropriate. Re-fit the model with a separate variance for each group. For each variance, use an Inverse-Gamma(1/2, 1/2) prior, corresponding to prior sample size 1 and prior guess 1 for each variance.
\index{dataset!warp breaks}
\index{model selection!DIC}
\index{model!cell means}
Report the DIC value for this model, rounded to the nearest whole number.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{r}
#| label: C2-L08-Ex2-6
X = model.matrix( ~ wool + tension, data=warpbreaks)
#ead(X)
data("warpbreaks")
#?warpbreaks
#head(warpbreaks)
library("rjags")
mod3_string = " model {
for( i in 1:length(y)) {
y[i] ~ dnorm(mu[woolGrp[i], tensGrp[i]], prec[woolGrp[i], tensGrp[i]])
}
for (j in 1:max(woolGrp)) {
for (k in 1:max(tensGrp)) {
mu[j,k] ~ dnorm(0.0, 1.0/1.0e6)
prec[j,k] ~ dgamma(0.5, 0.5)
sig[j,k] = sqrt(1.0 / prec[j,k])
}
}
} "
str(warpbreaks)
data3_jags = list(y=log(warpbreaks$breaks), woolGrp=as.numeric(warpbreaks$wool), tensGrp=as.numeric(warpbreaks$tension))
params3 = c("mu", "sig")
mod3 = jags.model(textConnection(mod3_string), data=data3_jags, n.chains=3)
update(mod3, 1e3)
mod3_sim = coda.samples(model=mod3,
variable.names=params3,
n.iter=5e3)
mod3_csim = as.mcmc(do.call(rbind, mod3_sim))
#plot(mod3_sim, ask=TRUE)
## convergence diagnostics
gelman.diag(mod3_sim)
autocorr.diag(mod3_sim)
effectiveSize(mod3_sim)
raftery.diag(mod3_sim)
(dic3 = dic.samples(mod3, n.iter=1e3))
```
\index{model selection!DIC}
\index{MCMC!Raftery and Lewis diagnostic}
\index{MCMC!effective sample size}
\index{MCMC!autocorrelation diagnostics}
\index{MCMC!Gelman diagnostics}
The **DIC** for this model is much higher than for the other models, suggesting that separate variances does not improve predictive accuracy in this model. However, the variance for the last group does appear to be smaller than the others, which may affect our inferences for the mean of the last group.
:::
:::::