The results of a simple linear regression are only meaningful when the simple linear regression model is an appropriate description of the data of interest. I.e. the data satisfies all of the assumptions of simple linear regression. We can never prove the assumptions are satisfied but we can look for evidence assumptions are violated. We generally do this by looking at residual plots. (Remember a residual is the difference between the observed response and the response the regression model predicts).
We discussed three types of residuals plots in ST411/511
I have a pdf document that summarises these plots 1-regression-plots.pdf. The first two plots look very similar with simple linear regression, but we’ll see later in the quarter they can be quite different in multiple regression.
You will also find some examples of different combinations of violations apparent in fitted values versus residuals plots in following pdf document: 01-examples.pdf. Your TA will go over these at the start of lab. They were intentially generated with large sample sizes to make the patterns obvious. With smaller samples it’s harder to detect patterns.
The dataset ex0824
from Sleuth2
contains the age and respiratory rate for 618 children.
Fit a linear regression model of Rate
on Age
and examine the three diagnostic plots. Is there evidence any of the assumptions are violated? Check out ST511 Lab 8 if you need a refresher on how to do this in R.
Now try fitting a linear regression of log(Rate)
on Age
. Do the residual plots look better?
How do you interpret the slope parameter now the response is log transformed? (Read Section 8.4 in Sleuth, copy here)
You might find it useful to look at examples of these plots from data known to satisfy the assumptions. Have a look at: http://glimmer.rstudio.com/cwick/regression-plots/.
I’ve generated data known to satisfy the assumptions of regression and made all the residual plots. You’ll notice that even though the data satisfy the assumptions you still get residual plots that don’t look “perfect”. Hit the “Again! Again!” button to generate a new dataset.
Change the sample size, n, and reexamine the plots. You can also change whether the explanatory variable is discrete (x can only be an integer between 0 and 4) or continuous (x is uniformly distributed on any value between 0 and 4).