As we’ve seen in lecture, a two-way ANOVA is just a special case of multiple regression. There is nothing new in terms of fitting the saturated model, examining residuals and influence, testing it against the additive model with an F-test, and interpreting parameters. There are however some special operations that are particularly useful for data that fits into a two way table.
As an example we’ll use the Seaweed Grazing experiment from class.
Remember the raw response is the percentage of the rock covered in seaweed. We saw a transformation was appropriate, so let’s create new variable for the log regeneration ratio.
As a useful summary we might be interested in the average regeneration ratio at each combination of our two factors. If our data is balanced this also corresponds to our estimates of the mean response at each combination of factors in the saturated model. The aggregate
function is very useful for this kind of operation. For example, run
It takes a formula just like lm
, on the left hand side of the ~
is the variable contained in the cells of our table, on the right hand side the variables that we want to group by (in the two-way case, our rows and columns). We tell it where those variables live and the final argument, tells aggregate if there is more than one observation in the cell define by Block
and Treat
to aggregate them using the mean function. The result is a data.frame with the mean
of the log regeneration ratio for each combination of Block and Treat. Sometimes it’s easier to display the results in a table. We’ll save the results of the averaging, then use the xtabs
function to display it as a two way table, again the formula notation is used.
For display in a report I might round these numbers too
This also gives us a convenient way to check if the data is balanced. Instead of aggregating over cells using mean
we can aggregate by counting the number of observations using length
Yup, two observations per cell, this data is balanced.
We can also use aggregate to find Treatment and Block averages
Or since the data is balanced, find the same averages as row and column averages of our cell averages:
Compare the column means to the aggregating with just Blocks. They are identical. As an example of this not working let’s do the same thing with the Pygmalion data
Check that the Company averages are different to the average of the columns in the second table.
Going back to our table of averages for the grazing data
Since the data is balanced we can use this table to find the mean response for each cell using the saturated model. In the saturated model the estimated mean response is simply the average response in the cell, so we can read it right off. For example, the estimated mean response for Block 1 with the control treatment is -1.51. We’ll talk about finding the estimated mean using the additive model later.
The the other useful purpose for the cell averages is plotting. We can use our cell_avgs
data.frame,
It might be nicer to reorder the block and join treatments with lines:
And we might want to add the original data back in to keep in mind the variation about these averages
Nothing new here, we fit both models and test with and extra sum of squares F-test
“There is no evidence that the effect of treatment depends on the block (extra sum of squares F-test on interaction, p value = 0.12).”
Given the additive model is adequate, let’s find the estimated mean response for all the cells in our table. The easiest way is to build a new data.frame with all combinations of our levels and use predict
We can present this as a table or plot them
The above code works equally well, if the saturated model had been used:
Estimating particular contrasts of interest can be achieved in two ways: reparameterizing so that the contrasts of interest is captured by a single parameter, or using the multcomp
package to specify the contrast and estimate it (possibly with multiple comparison adjustments).