Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: Techniques and Models
Oren Bochman
Conditional Probability, ANOVA, Homework
Section omitted to comply with the Honor Code
---
title: "Homework on Logistic Regression - M3L9HW1"
subtitle: "Bayesian Statistics: Techniques and Models"
categories:
- Bayesian Statistics
keywords:
- Conditional Probability
- ANOVA
- Homework
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
```{r}
#| label: 'Setup'
#| lst-label: lst-Logistic-Regression-0
#| lst-cap: 'Setup'
#| attr-source: "id=lst-Logistic-Regression-0 lst-cap='setup'"
#| echo: false
#| output: false
library("rjags")
```
::: {#exr-Logistic-Regression-1}
[**Logistic Regression**]{.column-margin} What is the advantage of using a link function such as the logit transform for logistic regression?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [x] It ensures that the success probability ($\mathbb{E}[y]$ if $y$ is Bernoulli) is between 0 and 1 without requiring any constraints on the x variables or the $\beta$ coefficients.
- [ ] It ensures that $\beta_0+\beta_1x_1+\ldots+\beta_kx_k$ is between 0 and 1 using log transformations of the \beta coefficients.
- [ ] It makes the $\beta$ coefficients interpretable directly as probabilities.
- [ ] It ensures that the $\beta$ coefficients lie between 0 and 1 for all values of predictors x.
This is a categorical predictor.
:::
::: {#exr-Logistic-Regression-2}
[**Logistic Regression**]{.column-margin} Logistic regression works with binomial likelihoods in addition to Bernoulli likelihoods. If the response $y_i$ is a number of successes in $n_i$ independent trials each with $\phi_i$ success probability, we can still model $\phi_i$ with a linear model using the logit transformation.
As an example, consider the OME data in the MASS package in R. The data consist of experimental results from tests of auditory perception in children. Under varying conditions and for multiple trials under each condition, children either correctly or incorrectly identified the source of changing signals.
Although the independence of the trails and results are questionable, we'll try fitting a logistic regression to these data. First, we'll explore the relationships briefly with the following code:
\index{dataset!OME}
```{r}
#| label: 'lst-Logistic-Regression-2'
#| lst-label: 'lst-Logistic-Regression-2'
#| attr-source: "id=lst-Logistic-Regression-2 lst-cap='EDA of the OME dataset'"
#| fig-cap: 'EDA of the OME dataset'
#| fig-subcap:
#| - "age vs success"
#| - "OME vs success"
#| - "Loud vs success"
#| - "Noise vs success"
#| lightbox:
#| group: q2
#| column: margin
library("MASS")
data("OME")
?OME # background on the data
head(OME)
any(is.na(OME)) # check for missing values
dat = subset(OME, OME != "N/A") # manually remove OME missing values identified with "N/A"
dat$OME = factor(dat$OME)
str(dat)
plot(dat$Age, dat$Correct / dat$Trials )
plot(dat$OME, dat$Correct / dat$Trials )
plot(dat$Loud, dat$Correct / dat$Trials )
plot(dat$Noise, dat$Correct / dat$Trials )
```
We are interested how these variables relate to the probability of successfully identifying the source of changes in sound. Of these four variables, which appears to have the weakest association with the probability of success?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] Age in months
- [x] OME: degree of otitis media with effusion (low or high)
- [ ] Loudness of stimulus in decibels
- [ ] Noise: stimulus type (coherent or incoherent)
With a Bernoulli likelihood, $\mathbb{E}(y)$ is the probability of success, which should be a proper probability. If the likelihood is Binomial, then the expected value of y is the the number of trials times the success probability. Here we would still use a `logit` likelihood on the success probability.
:::
::: {#exr-Logistic-Regression-3}
[**Logistic Regression**]{.column-margin} Next, we'll fit a reference logistic regression model with non-informative prior in R. We can do this with the glm function, providing the model formula as with the usual lm, except now the response is the observed proportion of correct responses. We must also indicate how many trials were run for each experiment using the weights argument.
```{r}
#| label: 'lst-Logistic-Regression-3'
#| lst-label: 'lst-Logistic-Regression-3'
#| lst-cap: 'OME GLM'
#| attr-source: "id=lst-Logistic-Regression-3 lst-cap='OME GLM'"
mod_glm = glm(Correct/Trials ~ Age + OME + Loud + Noise, data=dat, weights=Trials, family="binomial")
summary(mod_glm)
```
To get an idea of how the model fits, we can create residual (using a special type of residual for non-normal likelihoods) and in-sample prediction plots.
```{r}
#| label: 'fig-Logistic-Regression-3-res'
#| fig-cap:
#| - 'residuals plot'
#| - 'in-sample prediction plots'
#| lightbox:
#| group: q3
#| layout-ncol: 2
#| column: margin
plot(residuals(mod_glm, type="deviance"))
plot(fitted(mod_glm), dat$Correct/dat$Trials)
```
It appears from the second plot that the model is not very precise (some model predictions were far from the observed proportion of correct responses). Nevertheless, it can be informative about the relationships among the variables.
Report the posterior mode estimate of the coefficient for low OME.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
-0.24
:::
::: {#exr-Logistic-Regression-4}
[**Logistic Regression**]{.column-margin} Next, we will fit a similar model in JAGS. To make the results comparable to those of the reference model, we will use the same configuration of covariates. We can extract this information from the reference model using model.matrix.
```{r}
#| label: C2-L09-Ex1-5
X = model.matrix(mod_glm)[,-1] # -1 removes the column of 1s for the intercept
head(X)
```
The data include categorical covariates which R codes as dummy variables (as with ANOVA). Hence we have an indicator variable for whether OME is at its low level and another indicating whether the Noise is incoherent. The intercept is then associated with this baseline group. Ignoring the continuous variables Age and Loud, what are the characteristics of this baseline group?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] Children with low OME exposed to incoherent sound.
- [ ] Children with high OME exposed to incoherent sound.
- [ ] Children with low OME exposed to coherent sound.
- [x] Children with high OME exposed to coherent sound.
:::
::: {#exr-Logistic-Regression-5}
[**Raftery and Lewis diagnostic**]{.column-margin} Now complete the following code (as well as the code from previous questions) to fit the JAGS model with the fairly non-informative priors given. Use three chains with at least 5,000 iterations in each.
```{r}
#| label: 'lst-Logistic-Regression-5'
#| lst-cap: 'Jags logistic regression model'
#| attr-source: "id=lst-Logistic-Regression-5 lst-cap='Jags logistic regression model'"
mod1_string = " model {
for (i in 1:length(y)) {
y[i] ~ dbin(phi[i], n[i])
logit(phi[i]) = b0 + b[1]*Age[i] + b[2]*OMElow[i] + b[3]*Loud[i] + b[4]*Noiseincoherent[i]
}
b0 ~ dnorm(0.0, 1.0/5.0^2)
for (j in 1:4) {
b[j] ~ dnorm(0.0, 1.0/4.0^2)
}
} "
data_jags = as.list(as.data.frame(X))
data_jags$y = dat$Correct # <1>
data_jags$n = dat$Trials
str(data_jags) # <2>
params = c("b0", "b")
mod1 = jags.model(textConnection(mod1_string), data=data_jags, n.chains=3)
update(mod1, 1e3) # <3>
mod1_sim = coda.samples(model=mod1, variable.names=params, n.iter=5e3) #<4>
mod1_csim = as.mcmc(do.call(rbind, mod1_sim))
```
1. this will not work if there are missing values in dat (because they would be ignored by model.matrix). Always make sure that the data are accurately pre-processed for JAGS.
2. ensure all variables have the same number of observations (712).
3. burn in for 1k iterations
4. sim at least 5k iterations
```{r}
#| label: 'lst-Logistic-Regression-5-2'
#| fig-cap: 'Trace plots'
#| fig-subcap:
#| - 'b[1] trace - age '
#| - 'b[1] density - age'
#| - 'b[2] trace - OME'
#| - 'b[2] density - OME'
#| - 'b[3] trace - Loud'
#| - 'b[3] density - Loud'
#| - 'b[4] trace - incoherent'
#| - 'b[4] density - incoherent'
#| - 'b[0] trace - intercept'
#| - 'b[0] density - intercept'
#| fig-height: 4
#| column: margin
#| layout-ncol: 2
#| lightbox:
#| group: q4
## convergence diagnostics
plot(mod1_sim,auto.layout = FALSE)
```
```{r}
#| label: 'lst-Logistic-Regression-5-3'
gelman.diag(mod1_sim)
autocorr.diag(mod1_sim)
```
```{r}
#| label: 'lst-Logistic-Regression-5-4'
#| fig-cap: 'MCMC auto-correlation'
#| fig-subcap:
#| - 'b[1] ac - age '
#| - 'b[2] ac - OME'
#| - 'b[3] ac - loud'
#| - 'b[4] ac - incoherent'
#| - 'b[0] ac - intercept'
#| fig-height: 4
#| column: margin
#| layout-ncol: 2
#| lightbox:
#| group: q4
autocorr.plot(mod1_csim,auto.layout = FALSE)
```
```{r}
effectiveSize(mod1_sim)
#| label: 'lst-Logistic-Regression-5-5'
dic1 = dic.samples(mod1, n.iter=1e3) #<1>
```
1. calculate DIC
Because there are many data points, the MCMC will take some time to run.
!index{model selection!DIC} \index{MCMC!Raftery and Lewis diagnostic}
Before analyzing the results, perform some MCMC diagnostic checks. What does the Raftery and Lewis diagnostic `(raftery.diag())` suggest about these chains?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{r}
#| label: C2-L09-Ex1-11
library("coda")
(raftery.diag(mod1))
```
- [ ] The dependence factor for many of the variables is large (\>5.0), indicating weak autocorrelation in the chains. We would not require a large number of iterations to reliably produce 95% probability intervals for the parameters.
- [ ] The scale reduction factor for many variables is large (\>5.0), indicating that the different chains are exploring the same space. We have used a sufficient burn-in time.
- [x] The dependence factor for many of the variables is large ($>5.0$), indicating strong autocorrelation in the chains. We would require a large number of iterations to reliably produce 95% probability intervals for the parameters.
- [ ] The scale reduction factor for many variables is large ($>5.0$), indicating that the different chains are not exploring the same space yet. We need to run a longer burn-in period.
\index{diagnostic!Raftery Lewis}
The Raftery and Lewis diagnostic estimates how many iterations of the current chain would be required to reliably estimate the outer quantiles of the posterior.
:::
::: {#exr-Logistic-Regression-6}
[**Logistic Regression**]{.column-margin} Although `OMElow` is the predictor with weakest statistical association to probability of correct responses, the posterior probability that its coefficient $\beta_2$ is negative is still greater than $0.9$. How do we interpret this (most likely) negative coefficient in the context of our model?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
- [ ] While holding all other predictors constant, low OME is associated with a decrease of magnitude $∣\beta_2∣$ in the probability of correct responses while high OME is associated with an increase $∣\beta_2∣.$
- [ ] While holding all other predictors constant, low OME is associated with a higher probability of correct responses than high OME.
- [x] While holding all other predictors constant, low OME is associated with a lower probability of correct responses than high OME.
- [ ] While holding all other predictors constant, low OME is associated with an increase of magnitude $∣\beta_2∣$ in the probability of correct responses while high OME is associated with a decrease of $\beta_2∣$.
Since low `OME` is coded with a one and has a negative coefficient, low `OME` is associated with lower log-odds and consequently lower probability.
It may also be interesting to try a model that includes interaction terms to see if, for example, the effect of low/high OME is different for different Age groups.
:::
::: {#exr-Logistic-Regression-7}
[**Logistic Regression**]{.column-margin} Using the posterior mean estimates of the model coefficients, create a point estimate of the probability of correct responses for a child of age 60 months, with high OME, using a coherent stimulus of 50 decibels. Round your answer to two decimal places.
:::: {.callout-note}
#### Hint:
First calculate the linear part by multiplying the variables by the coefficients and adding them up (call this xb). Once you have that, apply the inverse of the link function to transform it into a probability estimate. Recall that the inverse of the logit transformation is $\phi=\frac{1}{1+e^{−xb}}$.
::::
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{r}
#| label: C2-L09-Ex1-12
# logit(phi[i]) = b0 + b[1]*Age[i] + b[2]*OMElow[i] + b[3]*Loud[i] + b[4]*Noiseincoherent[i]
A=X[,1]==60 # <1>
B=X[,2]==0 # <2>
C=X[,3]==50 # <3>
D=X[,4]==0 # <4>
ABCD = X[A & B & C & D, c(1,2,3,4)] # <5>
pm_coef = colMeans(mod1_csim) # <6>
pm_Xb = pm_coef["b0"] + ABCD %*% pm_coef[1:4] # <7>
phat = 1.0 / (1.0 + exp(-pm_Xb)) # <8>
mean(phat) # <9>
```
1. child of age 60 months
2. with high OME,\
3. using a coherent
4. stimulus of 50 decibels
5. combining
6. infer mean for parameters from MCMC samples
7. multiplying the variables by the coefficients
8. inverse of the link function to transform
9. point estimate
:::
::: {#exr-Logistic-Regression-8}
[**Logistic Regression**]{.column-margin} Use the posterior mean estimates of the model coefficients to create point estimates of the probability of correct responses for each observation in the original data. To do this, follow the steps outlined in the lesson to create a vector of these probabilities called phat (using our notation from this quiz, it would be $\hat\phi$).
Once you have phat, calculate the proportion of in-sample observations that are correctly classified according to the following criterion: the model prediction and observed correct response rate are either both higher than 0.7 or both lower than 0.7. Round your answer to two decimal places.
:::: {.callout-note}
#### Hint:
Use the following code:
```{r}
#| label: C2-L09-Ex1-13
# recall: logit(phi[i]) = b0 + b[1]*Age[i] + b[2]*OMElow[i] + b[3]*Loud[i] + b[4]*Noiseincoherent[i]
(pm_coef = colMeans(mod1_csim))
```
::::
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{r}
#| label: C2-L09-Ex1-14
pm_Xb = pm_coef["b0"] + X[,c(1,2,3,4)] %*% pm_coef[1:4]
phat = 1.0 / (1.0 + exp(-pm_Xb))
head(phat)
(tab0.7 = table(phat > 0.7, (dat$Correct / dat$Trials) > 0.7))
sum(diag(tab0.7)) / sum(tab0.7)
```
0.84
It appears that the accurate cases (high probability of correct responses) are well captured by the model.
In this exercise, we obtained a point estimate of the coefficients and used that to obtain a point estimate of the probabilities. If we want posterior distributions for the probabilities, we could apply the inverse link transformation to each iteration of the coefficients.
:::
:::::