Stat 411/511

Lab 1

Jan 7th

Review of regression diagnostic plots

The results of a simple linear regression are only meaningful when the simple linear regression model is an appropriate description of the data of interest. I.e. the data satisfies all of the assumptions of simple linear regression. We can never prove the assumptions are satisfied but we can look for evidence assumptions are violated. We generally do this by looking at residual plots. (Remember a residual is the difference between the observed response and the response the regression model predicts).

We discussed three types of residuals plots in ST411/511

Residuals against fitted values
Residuals against explanatory values
A normal probability plot for residuals

I have a pdf document that summarises these plots 1-regression-plots.pdf. The first two plots look very similar with simple linear regression, but we’ll see later in the quarter they can be quite different in multiple regression.

You will also find some examples of different combinations of violations apparent in fitted values versus residuals plots in following pdf document: 01-examples.pdf. Your TA will go over these at the start of lab. They were intentially generated with large sample sizes to make the patterns obvious. With smaller samples it’s harder to detect patterns.

The dataset ex0824 from Sleuth2 contains the age and respiratory rate for 618 children.

library(Sleuth3)
head(ex0824)

##   Age Rate
## 1 0.1   53
## 2 0.2   38
## 3 0.3   58
## 4 0.3   52
## 5 0.3   42
## 6 0.4   62

Fit a linear regression model of Rate on Age and examine the three diagnostic plots. Is there evidence any of the assumptions are violated? Check out ST511 Lab 8 if you need a refresher on how to do this in R.

Now try fitting a linear regression of log(Rate) on Age. Do the residual plots look better?

How do you interpret the slope parameter now the response is log transformed? (Read Section 8.4 in Sleuth, copy here)

Calibrating your eyes

You might find it useful to look at examples of these plots from data known to satisfy the assumptions. Have a look at: http://glimmer.rstudio.com/cwick/regression-plots/.

I’ve generated data known to satisfy the assumptions of regression and made all the residual plots. You’ll notice that even though the data satisfy the assumptions you still get residual plots that don’t look “perfect”. Hit the “Again! Again!” button to generate a new dataset.

Change the sample size, n, and reexamine the plots. You can also change whether the explanatory variable is discrete (x can only be an integer between 0 and 4) or continuous (x is uniformly distributed on any value between 0 and 4).