Stat 412/512

Homework 6

Due Mar 4th on canvas

Conceptual Exercises

(Chap 12, p. 375) 1, 2, 3, 4, 5, 6
(Chap 14, pg 443) 1, 2, 3, 4, 5, 6
(Chap 15, pg 469) 1, 2, 3, 4, 6
(Chap 16, pg 506) 1, 4

Do the above conceptual exercises from the Sleuth but do not hand them in as part of your homework. The answers can be found at the end of the chapter, but make an honest effort to answer them before looking. They are a useful way to gauge how much you understood and can show up on exams.

1 Model Selection

For this question we will investigate using model selection to identify models for the covariates in the data from data analysis 2.

  1. The following is the largest model we are interested in considering:
lm(WtLoss24 ~ Sex*(Age + I(Age^2) + BMI + I(BMI^2) + Age:BMI), data = ex1420)

and allows for some non-linearity of the continuous variables and all possible interactions, and has a total of 12 terms. Use regsubsets to find the 5 best models up to size 12.

  1. Produce a BIC plot (a plot of the bic against size for the models). What size model is best fitting according to BIC?

  2. The BIC for an intercept only model is 5.61, does this change your answer to Q2?

  3. What are the best five models according to BIC and AIC? (The AIC for the intercept only model is 962.66)

  4. Examine the models of size 2. Why do AIC, BIC and Cp all agree on the best model of size 2? Which of models follow good practice (don’t include interactions wihtout main effects, and don’t include quadratic terms without the corresponding linear term)?

  5. Refit all of top 10 models that follow good practice, according to AIC (there are 3), including the Diet variable. Compare the estimate of the difference in mean weight loss between the Low Carb and Low Fat Diet. Are the models consistent in their conclusions?

R Hints: if models contains the fortified regsubsets models (like in lab 7), then since it is just a data.frame you can manipulate with subset, e.g.

subset(models, size == 5)

will return the rows corresonding to model with 5 parameters (four variables + an intercept). You can pull out the 5 models with the lowest BIC with,

models[order(models$bic)[1:5], ]

Also by default it doesn’t include AIC, but you can add it with,

models$aic <- n * log(models$rss/n) + 2*(models$size + 1)


ex1516 contains numbers of firearm deaths and motor vehicle deaths in the United States.

##   Year FirearmDeaths MotorVehicleDeaths
## 1 1968            23                 55
## 2 1969            24                 56
## 3 1970            27                 54
## 4 1971            29                 54
## 5 1972            30                 56
## 6 1973            31                 55
  1. Fit a regression model of the number of MotorVehicleDeaths on Year and retain the residuals.

  2. Create a scatterplot of the residuals against the lag 1 residuals (like slide 10 from 21-serial-correlation).

  3. Construct a partial autocorrelation plot.

  4. Is there any evidence of serial correlation?