Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: Techniques and Models
Oren Bochman
Conditional Probability, Bayes’ Law, Homework
Section omitted to comply with the Honor Code
---
title: "Homework on Linear Regression Model Part 1 - M2L5HW1"
subtitle: "Bayesian Statistics: Techniques and Models"
categories:
- Bayesian Statistics
keywords:
- Conditional Probability
- Bayes' Law
- Homework
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
::: {#exr-linear-reggression-part1-1}
[Linear Regression]{.column-margin}
In a normal linear regression model with
$$
E(y_i)=\beta_0+\beta_1x_{1,i}+\beta_2x_{2,i}+\beta_3x_{3,i}
$$
which of the following gives the correct interpretation of $\beta_2$?
:::
::: {.solution .callout-tip collapse="true"}
## Solution:
- [ ] While holding $x_{2,i}$ constant, the expectation of $y_{i}$ is $\beta_{2}$.
- [ ] While holding $x_{1,i}$ and $x{3,i}$ constant, a one unit change in $x_{2,i}$ results in a $\beta_2$ change in $y_i$.
- [x] While holding $x_{1,i}$ and $x_{3,i}$ constant, a one unit change in $x_{2,i}$ results in a $\beta_2$ change in the expectation of $y_i$.
- [ ] When $x_{2,i}=0$, a one unit change in $x_{1,i}$ and $x_{3,i}$ results in a $\beta_2$ in $y_i$.
This is how much the expected response increases when we increase $x2,i$ by $1.0$ while controlling for the other predictors.
:::
::: {#exr-linear-reggression-part1-2}
[Linear Regression]{.column-margin}
Which of the following model specifications for $E(y_i)$ is not a valid linear model?
:::
::: {.solution .callout-tip collapse="true"}
## Solution:
- [ ] $\beta_0 + \beta_1 x_{1,i} + \beta_2 ( x_{1,i} / x_{2,i} )$
- [ ] $\beta_0 + \beta_1 \sin( 2 \pi x_{1,i}) + \beta_2 x_{2,i}$
- [ ] $\beta_0 + \beta_1 \log (x_{1,i}) + \beta_2 x_{2,i}^2$
- [x] $\beta_0 + \exp(\beta_1 x_{1,i} ) + \beta_2 x_{2,i}^2$
This model is not linear in the coefficients. We are free to transform the predictors and the response, but the model itself must be linear.
:::
::: {#exr-linear-reggression-part1-3}
[Linear Regression]{.column-margin}
Consider the Anscombe data set in R which can be accessed with the following code:
\index{dataset!Anscombe}
```{r}
#| label: C2-L07-Ex1-1
library("car") # load the 'car' package
data("Anscombe") # load the data set
?Anscombe # read a description of the data
```
```{r}
#| label: C2-L07-Ex1-2
head(Anscombe) # look at the first few lines of the data
```
```{r}
#| label: fig-anscombe-pair-plot
#| fig-cap: "Anscombe Pair plot"
pairs(Anscombe) # scatter plots for each pair of variables
```
Suppose we are interested in relating per-capita education expenditures to the other three variables. Which variable appears to have the strongest linear relationship with per-capita education expenditures?
:::
::: {.solution .callout-tip collapse="true"}
## Solution
- [ ] Proportion of population that is urban
- [ ] None of these variables appears to have a linear relationship with education expenditures.
- [ ] Proportion of population under age 18
- [x] Per-capita income
It appears that increases in income generally co-occur with increases in education expenditures.
:::
::: {#exr-linear-reggression-part1-4}
[Linear Regression]{.column-margin}
Fit a reference (non-informative) Bayesian linear model to the Anscombe data with education expenditures as the response variable and include all three other variables as predictors. Use the `lm` function in R.
What is the posterior mean estimate of the intercept in this model? Round your answer to one decimal place.
:::
```{r}
#| label: C2-L07-Ex1-4
mod_0=lm(data = Anscombe,formula = 'education ~ income + young + urban')
summary((mod_0))
```
```{r}
#| label: fig-exr-linear-reggression-part1-4
#| fig-cap: "Diagnostic Charts"
#| fig-subcap:
#| - "Residuals vs Fitted"
#| - "Q-Q Residuals"
#| - "Scale-Location"
#| - "Cook's distance"
#| - "Residuals vs Leverage"
#| - "Cook's distance vs Leverage"
#| layout-ncol: 2
## Allow room for printing model formula in outer margin:
par( # mfrow = c(2, 3),
mar = c(3.5, 3.5, 2, 1), oma = c(0, 0, 2, 0), mgp = c(2.4, 0.8, 0)
)
# plot(mod_0,which=1:6,)
for (i in 1:6) {
plot(mod_0, which = i)
}
```
::: {.solution .callout-tip collapse="true"}
## Solution:
$$
-286.84
$$
Proper MCMC algorithms come with a theoretical guarantee of eventual convergence to the target distribution. Chains with very high autocorrelation may require an impractical number of iterations, but it is worth checking to see if a longer chain yields acceptable results.
:::
::: {#exr-linear-reggression-part1-5}
[Linear Regression]{.column-margin} In our reference analysis of the Anscombe data, the intercept is estimated to be negative. Does this parameter have a meaningful interpretation?
:::
::: {.solution .callout-tip collapse="true"}
## Solution:
- [ ] Yes, it represents expected expenditures in a state with average income, average percent youth, and average percent urban.
- [ ] No, there must be something wrong with the model because expenditures can never be negative.
- [ ] No, this model should not have an intercept term at all.
- [x] No, it represents expected expenditures in a state with 0 average income, 0 % youth, and 0 % urban which doesn't exist.
Although this parameter is **not very interpretable** in this particular example, **it is necessary** to the model.
[One strategy for making the intercept more interpretable would be to subtract the averages of the predictors from their values (i.e., $x_{1,i}^∗=(x_{1,i}−\bar{x}_1)$ ). Then the intercept would represent expected expenditures in a state with average income, average percent youth, and average percent urban.]{.mark}
:::
::: {#exr-linear-reggression-part1-6}
[Linear Regression]{.column-margin}
Use the code below to fit a linear regression model to the Anscombe data in JAGS. You will need to finish setting up and running the model.
```{r}
#| label: exr-linear-regression-part1-6-1
#| warning: FALSE
library("rjags")
mod_string = " model {
for (i in 1:length(education)) {
education[i] ~ dnorm(mu[i], prec) # <1>
mu[i] = b0 + b[1] * income[i] + b[2] * young[i] + b[3] * urban[i] # <2>
}
b0 ~ dnorm(0.0, 1.0 / 1.0e6) # <3>
for (i in 1:3) {
b[i] ~ dnorm(0.0, 1.0 / 1.0e6) # <4>
}
prec ~ dgamma(1.0 / 2.0, 1.0 * 1500.0 / 2.0) # <5>
sig2 = 1.0 / prec
sig = sqrt(sig2)
} "
data_jags = as.list(Anscombe)
params = c("b", "sig")
inits = function() {
#inits = list("b"=rnorm(2,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
inits = list("b" = rnorm(2,0.0,1.0/1.0e6),
"prec" = rgamma(1, 0.5, 1.0*1500.0/2.0) )
}
set.seed(72)
mod = jags.model(textConnection(mod_string), data=data_jags, n.chains=3)
update(mod, 10000) # burn-in
mod_sim = coda.samples(model=mod, variable.names=params, n.iter=10000)
gelman.diag(mod_sim)
autocorr.diag(mod_sim)
#autocorr.plot(mod_sim)
# we should combine the chains before running this plot
```
1. the likelihood, here `perc` is the variance of the normal distribution.
2. the linear part of the model
2. Prior for the intercept.
3. Priors for the coefficients.
4. Initial guess of variance based on overall variance of education variable.
Uses low prior effective sample size. Technically, this is not a true 'prior', but it is not very informative.
```{r}
#| label: fig-linear-regression-part1-6-2
mod_csim = do.call(rbind, mod_sim) # combine multiple chains
#autocorr.plot(mod_csim)
nvar <- ncol(as.matrix(mod_csim))
par(mfrow = c(nvar, 1))
par(oma = c(1, 1, 1, 1), mar = c(2, 2, 2, 1)) # tight margins
autocorr.plot(mod_csim, ask = FALSE, auto.layout = FALSE)
```
```{r}
#| label: fig-linear-regression-part1-6-3
par(mar = c(2.5, 1, 2.5, 1))
plot(mod_sim)
```
```{r}
#| label: C2-L07-Ex1-6-4
effectiveSize(mod_sim)
summary(mod_sim)
```
Before proceeding to inference, we should check our model.
The first step is to check our MCMC chains. Do there appear to be any problems with the chains?
:::
::: {.solution .callout-tip collapse="true"}
## Solution:
- [ ] Yes, scale reduction factors are well above 1.0. The chains are not exploring the same distribution.
- [ ] Yes, there is very high autocorrelation for sig. We should help the chain for sig by fixing the initial value.
- [ ] No, a few thousand iterations will be sufficient for these chains.
- [x] Yes, there is very high autocorrelation among the coefficients. It would be good to run the chain for 100,000+ iterations to get reliable estimates.
MCMC algorithms are based on a process guaranteeing convergence to a stationary distribution.
:::
::: {#exr-linear-reggression-part1-7}
[Linear Regression]{.column-margin} Which of the following is **not** a condition we can check using a residual plot with predicted values on the x-axis and residuals on the y-axis?
:::
::: {.solution .callout-tip collapse="true"}
## Solution:
- [ ] Linearity of the relationship between predictors and the response
- [ ] Presence of outliers
- [ ] Constant error variance
- [x] Independence of the observations
One way to check this assumption is by plotting predicted values against the data index. In the Anscombe data, we could check to see if residuals are more similar for states that are geographically close than for states that are not geographically close. If that is true, there may be spatial correlation in the data.
:::
::: {#exr-linear-reggression-part1-8}
[Linear Regression]{.column-margin} Check the residual plot described in Question 7 for the Anscombe data. Since the estimates of the coefficients in the reference model are very close to those in the JAGS model, we will just look at the residuals of the reference model. This plot is the first that appears when you run the following code:
```{r}
#| label: fig-exr-linear-reggression-part1-8
#| fig-cap: "LM Residual Plot"
plot(mod_0,which = 1)
# here mod_lm is the object saved when you run lm()
```
:::
::: {.solution .callout-tip collapse="true"}
## Solution:
- [ ] Yes, the observations appear not to be independent.
- [ ] Yes, there is a curved pattern or shape to the residuals, indicating a nonlinear relationship between the variables.
- [x] Yes, the error variability appears to increase as predicted values increase.
- [ ] No, this plot raises no concerns.
- [ ] Yes, there are a few extreme outliers.
There is no obvious pattern to these points besides the variability of the residuals. There are alternative versions of linear models that address this issue, but we will not pursue them in this course.
:::
:::::