Stat 412/512

Data Analysis 1

Due Feb 4th on Canvas

The grading rubric.

Data analyses should be handed in as a Word document or a pdf document in a report style. This means no raw R output and no R code. A separate file that contains your R code should also be submitted. You can see examples of this style on the ST511 website:

Keep your summary comprehensive but concise. Your TA should be able to read this section and know if you have approached the problem correctly and have answered all the questions asked. You can use the methods section to give more details.

Your report should include the following sections:

  • Introduction Give a brief overview of the data, a little bit of background and the questions of interest. Keep this concise, understandable to someone outside of this class, free of statistical jargon and to the point. You may want to provide a summary graphic of the data involved and some basic summary statistics.

  • Methods/Results Describe your reasoning for the procedures you have chosen to answer the questions. Explain any changes, transformations or other modifications you make to the data. You should include full specifications of any regression models you use, and how the parameters in the models relate to our questions of interest. The fitted models should be described (estimates and their standard errors, estimate of sigma and R-squared), and include summaries of test output.

  • Summary Provide a brief non-technical summary of your findings that answers the questions of interest (like the statistical summaries we have been writing). Make sure you include some indication of the scope of inference (Can population inference be made? To what population? Can causal inference be made?)

The data file bdims in the openintro package contains body measurements for 507 physically active individuals. Read the help file for the dataset for more information.

bdims$sex <- factor(bdims$sex, levels = c(1, 0), labels = c("male", "female"))

Reasearchers are interested in predicting an individuals weight in kgs (wgt) from just their waist girth in cms (, and have been using the following simple linear regression model:

They are interested in whether they should be using different lines for men and women.

To answer this question you should:

  • Suggest a model that allows for separate lines for men and women.
  • Find confidence intervals for the mean weight for someone who has a waist girth of 60cm, 80cm and 100cm in the simple linear regression model. Compare those confidence intervals to the confidence intervals obtained for males and females with waist girths of 60cm, 80cm and 100cm in the model you have suggested. You should comment on the practical significance of this difference.
  • Interpret the difference in slope between men and women.
  • Compare the models statistically.
  • Provide a figure that summarises the model you think is most appropriate.

(You do not need to check the regression assumptions (i.e. examine residual plots) for this data analysis)

Extra credit: Maybe the relationships should have some curvature, explore the practical and statistical significance of models that allow a quadratic relationship between weight and waist girth.