Stat 411/511

# Review of regression diagnostic plots

The results of a simple linear regression are only meaningful when the simple linear regression model is an appropriate description of the data of interest. I.e. the data satisfies all of the assumptions of simple linear regression. We can never prove the assumptions are satisfied but we can look for evidence assumptions are violated. We generally do this by looking at residual plots. (Remember a residual is the difference between the observed response and the response the regression model predicts).

We discussed three types of residuals plots in ST411/511

• Residuals against fitted values
• Residuals against explanatory values
• A normal probability plot for residuals

I have a pdf document that summarises these plots 1-regression-plots.pdf. The first two plots look very similar with simple linear regression, but we’ll see later in the quarter they can be quite different in multiple regression.

You will also find some examples of different combinations of violations apparent in fitted values versus residuals plots in following pdf document: 01-examples.pdf. Your TA will go over these at the start of lab. They were intentially generated with large sample sizes to make the patterns obvious. With smaller samples it’s harder to detect patterns.

The dataset ex0824 from Sleuth2 contains the age and respiratory rate for 618 children.

Fit a linear regression model of Rate on Age and examine the three diagnostic plots. Is there evidence any of the assumptions are violated? Check out ST511 Lab 8 if you need a refresher on how to do this in R.

Now try fitting a linear regression of log(Rate) on Age. Do the residual plots look better?

How do you interpret the slope parameter now the response is log transformed? (Read Section 8.4 in Sleuth, copy here)