library(Sleuth3)
library(ggplot2)
source(url("http://stat512.cwick.co.nz/code/stat_qqline.r"))
## Loading required package: proto

# look at the first few entries
head(ex0824)
##   Age Rate
## 1 0.1   53
## 2 0.2   38
## 3 0.3   58
## 4 0.3   52
## 5 0.3   42
## 6 0.4   62

# Fit the model
fit <- lm(Rate ~ Age, data = ex0824)

# Look at the TWO residual plots to assess linearity and constant spread
qplot(.fitted, .resid, data = fit) + geom_smooth()  #Residuals Vs. fitted values
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-1

qplot(Age, .resid, data = fit) + geom_smooth()  #Residuals Vs. Age (explanatory variable)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-1

From these two plots it definately looks like like the constant variance assumption is violated. Exploratory analysis indicates that as age increases the standard deviation (or variance or spread) decreases. While less evident it also appears the linearity assumption is also violated.

# Let's also check the normality
qplot(sample = .resid, data = fit) + stat_qqline()

plot of chunk unnamed-chunk-2

From this plot it appears there are some outliers on the right. So our assumption about normality may not be that great. How will this affect our results? Let's see what the fabulous log transform does here

fit <- lm(log(Rate) ~ Age, data = ex0824)


# Look at the TWO residual plots to assess linearity and constant spread
qplot(.fitted, .resid, data = fit) + geom_smooth()  #Residuals Vs. fitted values
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-3

qplot(Age, .resid, data = fit) + geom_smooth()  #Residuals Vs. Age (explanatory variable)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

plot of chunk unnamed-chunk-3


# Let's also check the normality
qplot(sample = .resid, data = fit) + stat_qqline()

plot of chunk unnamed-chunk-3

This time it appears to me that all of our assumptions are met. So now how about we look at the interpretation of the slope. Remeber we log transformed so first let us find the estimate and back transform.

summary(fit)  #This tells us the estimate of the slope is -0.019
## 
## Call:
## lm(formula = log(Rate) ~ Age, data = ex0824)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.626 -0.132 -0.004  0.135  0.548 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.845119   0.012628   304.5   <2e-16 ***
## Age         -0.019009   0.000736   -25.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.196 on 616 degrees of freedom
## Multiple R-squared:  0.52,   Adjusted R-squared:  0.519 
## F-statistic:  668 on 1 and 616 DF,  p-value: <2e-16
exp(-0.019)  # this value is equal to 0.9812
## [1] 0.9812
exp(-0.019 - 1.96 * 0.0007357)
## [1] 0.9798
exp(-0.019 + 1.96 * 0.0007357)  #This is the 95% CI and the values are 0.9798 and 0.9826
## [1] 0.9826

Sample evidence indicates that an increase in age of one year is associated with a multiplicative change in the median respiratory rate of 0.9825. (corresponding 95% Confidence interval 0.9798 and 0.9826). So in regualar english “As age increases by 1 year the median respiratory rate will be about 2% less.”