46 Logistic regression - M3L9 – Bayesian Statistics

Logistic regression is the preferred model when modelling a problem where the response variable is binary such as a classification or the outcome of a Bernoulli trial. In such the traditional least square fit suffers from a number of shortcomings. The main idea here is a log transform. However a naïve approach this transform imposes issues with 0 valued inputs since log(0)=-\infty

46.1 Introduction to Logistic Regression 🎥

46.1.1 Data

For an example of logistic regression , we’ll use the urine data set from the boot package in R. The response variable is r, which takes on values of 0 or 1. We will remove some rows from the data set which contain missing values.

logistic regression

library("boot")
data("urine")
?urine
head(urine)

  r gravity   ph osmo cond urea calc
1 0   1.021 4.91  725   NA  443 2.45
2 0   1.017 5.74  577 20.0  296 4.49
3 0   1.008 7.20  321 14.9  101 2.36
4 0   1.011 5.51  408 12.6  224 2.15
5 0   1.005 6.52  187  7.5   91 1.16
6 0   1.020 5.27  668 25.3  252 3.34

dat = na.omit(urine)

1: drop missing values

Let’s look at pairwise scatter plots of the seven variables.

pairs(dat)

One thing that stands out is that several of these variables are strongly correlated with one another. For example gravity and osmo appear to have a very close linear relationship. Collinearity between x variables in linear regression models can cause trouble for statistical inference. Two correlated variables will compete for the ability to predict the response variable, leading to unstable estimates. This is not a problem for prediction of the response, if prediction is the end goal of the model. But if our objective is to discover how the variables relate to the response, we should avoid collinearity.

Collinearity and Multicollinearity

When two covariates are highly correlated we call this relation collinearity. When one covariate in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy we call this relation multicollinearity. It is possible that no two pairs of a such a group of covariates are correlated.

In both cases this will lead to the design matrix being almost singular. Near singular matrices are a strong cause of instability in numerical calculations. Statistical this tends to lead to a model with inflated standard errors compared to models where we only keep the a subset where variables are neither collinear nor multicollinear. A consequence of this is that we will see a drop in statistical significance for these variables, which will make interpreting the model harder.

We have seen a few strategies ways to deal with these issues:

include pair plot in the exploratory data analysis phase.
picking subsets and checking DIC or,
variable selection using double exponential priors.
PCA creates independent covariates with a lower dimension with a trade of losing interpretability. See (Johnson and Wichern 2001, 386) (Belsley, Kuh, and Welsch 1980, 85–191) (Härdle and Simar 2019)
Feature elimination based on combination of Variance inflation factors (VIF) (Sheather 2009, 203)

We can more formally estimate the correlation among these variables using the corrplot package.

library("corrplot")

corrplot 0.95 loaded

Cor = cor(dat)
corrplot(Cor, type="upper", method="ellipse", tl.pos="d")
corrplot(Cor, type="lower", method="number", col="black", 
         add=TRUE, diag=FALSE, tl.pos="n", cl.pos="n")

46.1.2 Variable selection

One primary goal of this analysis is to find out which variables are related to the presence of calcium oxalate crystals. This objective is often called “variable selection.” We have already seen one way to do this: fit several models that include different sets of variables and see which one has the best DIC. Another way to do this is to use a linear model where the priors for the \beta coefficients favor values near 0 (indicating a weak relationship). This way, the burden of establishing association lies with the data. If there is not a strong signal, we assume it doesn’t exist.

Rather than tailoring a prior for each individual \beta based on the scale its covariate takes values on, it is customary to subtract the mean and divide by the standard deviation for each variable.

X = scale(dat[,-1], center=TRUE, scale=TRUE)
head(X[,"gravity"])

         2          3          4          5          6          7 
-0.1403037 -1.3710690 -0.9608139 -1.7813240  0.2699514 -0.8240622

colMeans(X)

      gravity            ph          osmo          cond          urea 
-9.861143e-15  8.511409e-17  1.515743e-16 -1.829852e-16  7.335402e-17 
         calc 
-1.689666e-18

apply(X, 2, sd)

gravity      ph    osmo    cond    urea    calc 
      1       1       1       1       1       1

46.1.3 Model

Our prior for the \beta (which we’ll call b in the model) coefficients will be the double exponential (or Laplace) distribution, which as the name implies, is the exponential distribution with tails extending in the positive direction as well as the negative direction, with a sharp peak at 0. We can read more about it in the JAGS manual. The distribution looks like:

ddexp = function(x, mu, tau) {
  0.5*tau*exp(-tau*abs(x-mu)) 
}
curve(ddexp(x, mu=0.0, tau=1.0), from=-5.0, to=5.0, 
      ylab="density", 
      main="Double exponential\ndistribution") # double exponential distribution
curve(dnorm(x, mean=0.0, sd=1.0), from=-5.0, to=5.0, 
      lty=2, add=TRUE) # normal distribution
legend("topright", 
      legend=c("double exponential", "normal"), 
      lty=c(1,2), bty="n")

library("rjags")

Loading required package: coda

Linked to JAGS 4.3.2

Loaded modules: basemod,bugs

mod1_string = 
  " model {
    for (i in 1:length(y)) {
        y[i] ~ dbern(p[i])
        logit(p[i]) = int + b[1]*gravity[i] + 
                            b[2]*ph[i] + 
                            b[3]*osmo[i] + 
                            b[4]*cond[i] + 
                            b[5]*urea[i] + 
                            b[6]*calc[i]
    }
    int ~ dnorm(0.0, 1.0/25.0)
    for (j in 1:6) {
        b[j] ~ ddexp(0.0, sqrt(2.0)) # has var 1.0
    }
} "

set.seed(92)
head(X)

     gravity         ph       osmo       cond        urea        calc
2 -0.1403037 -0.4163725 -0.1528785 -0.1130908  0.25747827  0.09997564
3 -1.3710690  1.6055972 -1.2218894 -0.7502609 -1.23693077 -0.54608444
4 -0.9608139 -0.7349020 -0.8585927 -1.0376121 -0.29430353 -0.60978050
5 -1.7813240  0.6638579 -1.7814497 -1.6747822 -1.31356713 -0.91006194
6  0.2699514 -1.0672806  0.2271214  0.5490664 -0.07972172 -0.24883614
7 -0.8240622 -0.5825618 -0.6372741 -0.4379226 -0.51654898 -0.83726644

data_jags = list(y=dat$r, 
                 gravity=X[,"gravity"], 
                 ph=X[,"ph"], 
                 osmo=X[,"osmo"], 
                 cond=X[,"cond"], 
                 urea=X[,"urea"], 
                 calc=X[,"calc"])
params = c("int", "b")

mod1 = jags.model(textConnection(mod1_string), data=data_jags, n.chains=3)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 77
   Unobserved stochastic nodes: 7
   Total graph size: 1085

Initializing model

update(mod1, 1e3)

mod1_sim = coda.samples(model=mod1,
                        variable.names=params,
                        n.iter=5e3)
mod1_csim = as.mcmc(do.call(rbind, mod1_sim))

## convergence diagnostics
par(mar = c(2.5, 1, 2.5, 1))
plot(mod1_sim, ask=TRUE)

gelman.diag(mod1_sim)

Potential scale reduction factors:

     Point est. Upper C.I.
b[1]          1       1.01
b[2]          1       1.00
b[3]          1       1.01
b[4]          1       1.00
b[5]          1       1.00
b[6]          1       1.00
int           1       1.00

Multivariate psrf

1

autocorr.diag(mod1_sim)

              b[1]         b[2]        b[3]         b[4]        b[5]
Lag 0   1.00000000 1.0000000000  1.00000000  1.000000000  1.00000000
Lag 1   0.82742680 0.2779806509  0.90239889  0.754645197  0.80289961
Lag 5   0.39315021 0.0065850899  0.59911895  0.341729068  0.38944950
Lag 10  0.15230656 0.0029960051  0.33361790  0.163098497  0.17060644
Lag 50 -0.01299165 0.0008840048 -0.01833375 -0.006226056 -0.02226404
                b[6]          int
Lag 0   1.0000000000  1.000000000
Lag 1   0.5110153383  0.302699650
Lag 5   0.0444978585  0.013155356
Lag 10 -0.0007916713  0.004242917
Lag 50  0.0063181718 -0.001352958

autocorr.plot(mod1_sim)

effectiveSize(mod1_sim)

     b[1]      b[2]      b[3]      b[4]      b[5]      b[6]       int 
1416.7001 8480.2827  769.4698 1482.3330 1346.1740 4672.2877 7779.8732

## calculate DIC
dic1 = dic.samples(mod1, n.iter=1e3)

Let’s look at the results.

summary(mod1_sim)


Iterations = 2001:7000
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean     SD Naive SE Time-series SE
b[1]  1.6323 0.7383 0.006028       0.019665
b[2] -0.1403 0.2838 0.002318       0.003088
b[3] -0.2594 0.8225 0.006715       0.029581
b[4] -0.7643 0.5095 0.004160       0.013337
b[5] -0.6104 0.6054 0.004943       0.016609
b[6]  1.5957 0.4941 0.004035       0.007231
int  -0.1819 0.3066 0.002503       0.003503

2. Quantiles for each variable:

        2.5%     25%     50%      75%  97.5%
b[1]  0.3357  1.1135  1.5700  2.09842 3.2265
b[2] -0.7292 -0.3209 -0.1249  0.04423 0.4011
b[3] -2.0803 -0.7119 -0.1775  0.21197 1.3028
b[4] -1.8109 -1.0961 -0.7563 -0.40700 0.1646
b[5] -1.9393 -0.9864 -0.5514 -0.17568 0.4031
b[6]  0.6944  1.2494  1.5710  1.90847 2.6444
int  -0.7830 -0.3856 -0.1839  0.02117 0.4247

#par(mfrow=c(3,2))
par(mar = c(2.5, 1, 2.5, 1))

densplot(mod1_csim[,1:6], xlim=c(-3.0, 3.0))

colnames(X) # variable names

[1] "gravity" "ph"      "osmo"    "cond"    "urea"    "calc"

It is clear that the coefficients for variables gravity, cond (conductivity), and calc (calcium concentration) are not 0. The posterior distribution for the coefficient of osmo (osmolarity) looks like the prior, and is almost centered on 0 still, so we’ll conclude that osmo is not a strong predictor of calcium oxalate crystals. The same goes for ph.

urea (urea concentration) appears to be a borderline case. However, if we refer back to our correlations among the variables, we see that urea is highly correlated with gravity, so we opt to remove it.

Our second model looks like this:

mod2_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dbern(p[i])
        logit(p[i]) = int + b[1]*gravity[i] + b[2]*cond[i] + b[3]*calc[i]
    }
    int ~ dnorm(0.0, 1.0/25.0)
    for (j in 1:3) {
        b[j] ~ dnorm(0.0, 1.0/25.0) # noninformative for logistic regression
    }
} "

mod2 = jags.model(textConnection(mod2_string), data=data_jags, n.chains=3)

Warning in jags.model(textConnection(mod2_string), data = data_jags, n.chains =
3): Unused variable "ph" in data

Warning in jags.model(textConnection(mod2_string), data = data_jags, n.chains =
3): Unused variable "osmo" in data

Warning in jags.model(textConnection(mod2_string), data = data_jags, n.chains =
3): Unused variable "urea" in data

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 77
   Unobserved stochastic nodes: 4
   Total graph size: 635

Initializing model

update(mod2, 1e3)

mod2_sim = coda.samples(model=mod2,
                        variable.names=params,
                        n.iter=5e3)
mod2_csim = as.mcmc(do.call(rbind, mod2_sim))

par(mar = c(2.5, 1, 2.5, 1))
#plot(mod2_sim, ask=TRUE)
plot(mod2_sim)

gelman.diag(mod2_sim)

Potential scale reduction factors:

     Point est. Upper C.I.
b[1]          1          1
b[2]          1          1
b[3]          1          1
int           1          1

Multivariate psrf

1

autocorr.diag(mod2_sim)

              b[1]       b[2]         b[3]           int
Lag 0  1.000000000 1.00000000  1.000000000  1.0000000000
Lag 1  0.582259117 0.65133227  0.496153487  0.2806887288
Lag 5  0.105547898 0.13292201  0.045852285  0.0350973969
Lag 10 0.025511484 0.02616215  0.002183037  0.0104379826
Lag 50 0.002581354 0.01053792 -0.008821086 -0.0008528231

autocorr.plot(mod2_sim)

effectiveSize(mod2_sim)

    b[1]     b[2]     b[3]      int 
3601.278 3031.869 4943.662 7753.339

dic2 = dic.samples(mod2, n.iter=1e3)

46.1.4 Results

dic1

Mean deviance:  68.77 
penalty 5.565 
Penalized deviance: 74.33

dic2

Mean deviance:  71.05 
penalty 3.94 
Penalized deviance: 74.99

summary(mod2_sim)


Iterations = 2001:7000
Thinning interval = 1 
Number of chains = 3 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

        Mean     SD Naive SE Time-series SE
b[1]  1.4222 0.5098 0.004162       0.008575
b[2] -1.3493 0.4539 0.003706       0.008254
b[3]  1.8723 0.5448 0.004449       0.007763
int  -0.1503 0.3248 0.002652       0.003702

2. Quantiles for each variable:

        2.5%     25%     50%     75%   97.5%
b[1]  0.4737  1.0676  1.4008  1.7515  2.4931
b[2] -2.2758 -1.6451 -1.3381 -1.0418 -0.4998
b[3]  0.8808  1.4856  1.8459  2.2264  3.0282
int  -0.7896 -0.3676 -0.1517  0.0675  0.4928

HPDinterval(mod2_csim)

          lower      upper
b[1]  0.4246496  2.4245842
b[2] -2.2058840 -0.4384313
b[3]  0.8552238  2.9791669
int  -0.7614556  0.5171678
attr(,"Probability")
[1] 0.95

#par(mfrow=c(3,1))
par(mar = c(2.5, 1, 2.5, 1))
densplot(mod2_csim[,1:3], xlim=c(-3.0, 3.0))

colnames(X)[c(1,4,6)] # variable names

[1] "gravity" "cond"    "calc"

The DIC is actually better for the first model. Note that we did change the prior between models, and generally we should not use the DIC to choose between priors. Hence comparing DIC between these two models may not be a fair comparison. Nevertheless, they both yield essentially the same conclusions. Higher values of gravity and calc (calcium concentration) are associated with higher probabilities of calcium oxalate crystals, while higher values of cond (conductivity) are associated with lower probabilities of calcium oxalate crystals.

There are more modeling options in this scenario, perhaps including transformations of variables, different priors, and interactions between the predictors, but we’ll leave it to you to see if you can improve the model.

--- title: "Logistic regression - M3L9" subtitle: "Bayesian Statistics: Techniques and Models" description: "An overview of logistic regression in the context of Bayesian statistics." categories: - Monte Carlo Estimation keywords: - logistic regression - Bayesian statistics - R programming - statistical modeling - classification - binary outcomes --- \index{regression!logistic} [Logistic regression is the preferred model when modelling a problem where the response variable is binary such as a classification or the outcome of a Bernoulli trial]{.mark}. In such the traditional least square fit suffers from a number of shortcomings. The main idea here is a log transform. However a naïve approach this transform imposes issues with 0 valued inputs since $log(0)=-\infty$ ## Introduction to Logistic Regression :movie_camera: {#sec-intro-logistic-regression} ![Introduction to logistic regression](images/c2l09-ss-01-Logistic-Regression.png){#fig-intro-logistic-regression .column-margin width="53mm"} ### Data \index{regression!logistic} \index{dataset!urine} For an example of logistic regression [**logistic regression**]{.column-margin}, we'll use the urine data set from the `boot` package in `R`. The response variable is `r`, which takes on values of 0 or 1. We will remove some rows from the data set which contain missing values. ```{r} #| label: C2-L09-1 library("boot") data("urine") ?urine head(urine) ``` ```{r} #| label: C2-L09-2 dat = na.omit(urine) #<1> ``` 1. drop missing values Let's look at pairwise scatter plots of the seven variables. ```{r} #| label: C2-L09-3 pairs(dat) ``` One thing that stands out is that several of these variables are strongly correlated with one another. For example `gravity` and `osmo` appear to have a very close linear relationship. Collinearity between $x$ variables in linear regression models can cause trouble for statistical inference. [Two correlated variables will compete for the ability to predict the response variable, leading to unstable estimates. This is not a problem for prediction of the response, if prediction is the end goal of the model. But if our objective is to discover *how* the variables relate to the response, we should avoid collinearity.]{.mark} ::: {.callout-important} ## Collinearity and Multicollinearity {.unnumbered .unlisted} \index{collinearity} \index{multicollinearity} When two covariates are highly correlated we call this relation **collinearity**. When one covariate in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy we call this relation **multicollinearity**. It is possible that no two pairs of a such a group of covariates are correlated. In both cases this will lead to the *design matrix* being almost *singular.* Near singular matrices are a strong cause of instability in numerical calculations. Statistical this tends to lead to a model with inflated *standard errors* compared to models where we only keep the a subset where variables are neither **collinear** nor **multicollinear**. A consequence of this is that we will see a drop in *statistical significance* for these variables, which will make interpreting the model harder. We have seen a few strategies ways to deal with these issues: 1. include `pair plot` in the exploratory data analysis phase. 2. picking subsets and checking DIC or, 3. variable selection using double exponential priors. 4. PCA creates independent covariates with a lower dimension with a trade of losing interpretability. See [@johnson2019applied p. 386] [@belsley1980 pp. 85-191] [@härdle2019] 5. Feature elimination based on combination of Variance inflation factors (VIF) [@sheather2009 p. 203] ::: We can more formally estimate the correlation among these variables using the `corrplot` package. ```{r} #| label: C2-L09-4 library("corrplot") Cor = cor(dat) corrplot(Cor, type="upper", method="ellipse", tl.pos="d") corrplot(Cor, type="lower", method="number", col="black", add=TRUE, diag=FALSE, tl.pos="n", cl.pos="n") ``` ### Variable selection One primary goal of this analysis is to find out which variables are related to the presence of calcium oxalate crystals. This objective is often called "variable selection." We have already seen one way to do this: fit several models that include different sets of variables and see which one has the best DIC. Another way to do this is to use a linear model where the priors for the $\beta$ coefficients favor values near 0 (indicating a weak relationship). This way, the burden of establishing association lies with the data. If there is not a strong signal, we assume it doesn't exist. Rather than tailoring a prior for each individual $\beta$ based on the scale its covariate takes values on, it is customary to subtract the mean and divide by the standard deviation for each variable. ```{r} #| label: C2-L09-5 X = scale(dat[,-1], center=TRUE, scale=TRUE) head(X[,"gravity"]) ``` ```{r} #| label: C2-L09-6 colMeans(X) ``` ```{r} #| label: C2-L09-7 apply(X, 2, sd) ``` ### Model Our prior for the $\beta$ (which we'll call $b$ in the model) coefficients will be the double exponential (or Laplace) distribution, which as the name implies, is the exponential distribution with tails extending in the positive direction as well as the negative direction, with a sharp peak at 0. We can read more about it in the `JAGS` manual. The distribution looks like: ```{r} #| label: C2-L09-8 ddexp = function(x, mu, tau) { 0.5*tau*exp(-tau*abs(x-mu)) } curve(ddexp(x, mu=0.0, tau=1.0), from=-5.0, to=5.0, ylab="density", main="Double exponential\ndistribution") # double exponential distribution curve(dnorm(x, mean=0.0, sd=1.0), from=-5.0, to=5.0, lty=2, add=TRUE) # normal distribution legend("topright", legend=c("double exponential", "normal"), lty=c(1,2), bty="n") ``` ```{r} #| label: C2-L09-9 library("rjags") ``` ```{r} #| label: C2-L09-10 mod1_string = " model { for (i in 1:length(y)) { y[i] ~ dbern(p[i]) logit(p[i]) = int + b[1]*gravity[i] + b[2]*ph[i] + b[3]*osmo[i] + b[4]*cond[i] + b[5]*urea[i] + b[6]*calc[i] } int ~ dnorm(0.0, 1.0/25.0) for (j in 1:6) { b[j] ~ ddexp(0.0, sqrt(2.0)) # has var 1.0 } } " set.seed(92) head(X) data_jags = list(y=dat$r, gravity=X[,"gravity"], ph=X[,"ph"], osmo=X[,"osmo"], cond=X[,"cond"], urea=X[,"urea"], calc=X[,"calc"]) params = c("int", "b") mod1 = jags.model(textConnection(mod1_string), data=data_jags, n.chains=3) update(mod1, 1e3) mod1_sim = coda.samples(model=mod1, variable.names=params, n.iter=5e3) mod1_csim = as.mcmc(do.call(rbind, mod1_sim)) ## convergence diagnostics par(mar = c(2.5, 1, 2.5, 1)) plot(mod1_sim, ask=TRUE) gelman.diag(mod1_sim) autocorr.diag(mod1_sim) autocorr.plot(mod1_sim) effectiveSize(mod1_sim) ## calculate DIC dic1 = dic.samples(mod1, n.iter=1e3) ``` \index{model selection!DIC} Let's look at the results. ```{r} #| label: C2-L09-11 summary(mod1_sim) ``` ```{r} #| label: C2-L09-12 #par(mfrow=c(3,2)) par(mar = c(2.5, 1, 2.5, 1)) densplot(mod1_csim[,1:6], xlim=c(-3.0, 3.0)) ``` ```{r} #| label: C2-L09-13 colnames(X) # variable names ``` It is clear that the coefficients for variables `gravity`, `cond` (conductivity), and `calc` (calcium concentration) are not 0. The posterior distribution for the coefficient of `osmo` (osmolarity) looks like the prior, and is almost centered on 0 still, so we'll conclude that `osmo` is not a strong predictor of calcium oxalate crystals. The same goes for `ph`. `urea` (urea concentration) appears to be a borderline case. However, if we refer back to our correlations among the variables, we see that `urea` is highly correlated with `gravity`, so we opt to remove it. Our second model looks like this: ```{r} #| label: C2-L09-14 mod2_string = " model { for (i in 1:length(y)) { y[i] ~ dbern(p[i]) logit(p[i]) = int + b[1]*gravity[i] + b[2]*cond[i] + b[3]*calc[i] } int ~ dnorm(0.0, 1.0/25.0) for (j in 1:3) { b[j] ~ dnorm(0.0, 1.0/25.0) # noninformative for logistic regression } } " mod2 = jags.model(textConnection(mod2_string), data=data_jags, n.chains=3) ``` ```{r} #| label: C2-L09-15 update(mod2, 1e3) mod2_sim = coda.samples(model=mod2, variable.names=params, n.iter=5e3) mod2_csim = as.mcmc(do.call(rbind, mod2_sim)) par(mar = c(2.5, 1, 2.5, 1)) #plot(mod2_sim, ask=TRUE) plot(mod2_sim) gelman.diag(mod2_sim) autocorr.diag(mod2_sim) autocorr.plot(mod2_sim) effectiveSize(mod2_sim) dic2 = dic.samples(mod2, n.iter=1e3) ``` ### Results ```{r} #| label: C2-L09-16 dic1 ``` ```{r} #| label: C2-L09-17 dic2 ``` ```{r} #| label: C2-L09-18 summary(mod2_sim) ``` ```{r} #| label: C2-L09-19 HPDinterval(mod2_csim) ``` ```{r} #| label: C2-L09-20 #par(mfrow=c(3,1)) par(mar = c(2.5, 1, 2.5, 1)) densplot(mod2_csim[,1:3], xlim=c(-3.0, 3.0)) ``` ```{r} #| label: C2-L09-21 colnames(X)[c(1,4,6)] # variable names ``` \index{Deviance information criterion} \index{model selection!DIC} The DIC is actually better for the first model. Note that we did change the prior between models, and generally we should not use the DIC to choose between priors. Hence comparing DIC between these two models may not be a fair comparison. Nevertheless, they both yield essentially the same conclusions. Higher values of `gravity` and `calc` (calcium concentration) are associated with higher probabilities of *calcium oxalate crystals*, while higher values of `cond` (conductivity) are associated with lower probabilities of *calcium oxalate crystals*. There are more modeling options in this scenario, perhaps including transformations of variables, different priors, and interactions between the predictors, but we'll leave it to you to see if you can improve the model.