Stat 411/511

# Goals

• Averaging over cells, rows and columns in two-way tables
• Displaying modeled means

As we’ve seen in lecture, a two-way ANOVA is just a special case of multiple regression. There is nothing new in terms of fitting the saturated model, examining residuals and influence, testing it against the additive model with an F-test, and interpreting parameters. There are however some special operations that are particularly useful for data that fits into a two way table.

As an example we’ll use the Seaweed Grazing experiment from class.

Remember the raw response is the percentage of the rock covered in seaweed. We saw a transformation was appropriate, so let’s create new variable for the log regeneration ratio.

As a useful summary we might be interested in the average regeneration ratio at each combination of our two factors. If our data is balanced this also corresponds to our estimates of the mean response at each combination of factors in the saturated model. The aggregate function is very useful for this kind of operation. For example, run

It takes a formula just like lm, on the left hand side of the ~ is the variable contained in the cells of our table, on the right hand side the variables that we want to group by (in the two-way case, our rows and columns). We tell it where those variables live and the final argument, tells aggregate if there is more than one observation in the cell define by Block and Treat to aggregate them using the mean function. The result is a data.frame with the mean of the log regeneration ratio for each combination of Block and Treat. Sometimes it’s easier to display the results in a table. We’ll save the results of the averaging, then use the xtabs function to display it as a two way table, again the formula notation is used.

For display in a report I might round these numbers too

This also gives us a convenient way to check if the data is balanced. Instead of aggregating over cells using mean we can aggregate by counting the number of observations using length

Yup, two observations per cell, this data is balanced.

We can also use aggregate to find Treatment and Block averages

Or since the data is balanced, find the same averages as row and column averages of our cell averages:

Compare the column means to the aggregating with just Blocks. They are identical. As an example of this not working let’s do the same thing with the Pygmalion data

Check that the Company averages are different to the average of the columns in the second table.

Going back to our table of averages for the grazing data

Since the data is balanced we can use this table to find the mean response for each cell using the saturated model. In the saturated model the estimated mean response is simply the average response in the cell, so we can read it right off. For example, the estimated mean response for Block 1 with the control treatment is -1.51. We’ll talk about finding the estimated mean using the additive model later.

The the other useful purpose for the cell averages is plotting. We can use our cell_avgs data.frame,

It might be nicer to reorder the block and join treatments with lines:

And we might want to add the original data back in to keep in mind the variation about these averages

Given the additive model is adequate, let’s find the estimated mean response for all the cells in our table. The easiest way is to build a new data.frame with all combinations of our levels and use predict
Estimating particular contrasts of interest can be achieved in two ways: reparameterizing so that the contrasts of interest is captured by a single parameter, or using the multcomp package to specify the contrast and estimate it (possibly with multiple comparison adjustments).