In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Rename this document with your student ID (not the 10-digit number, your IU username, e.g. dajmcdon). Include your buddy in the author field if you are working together.
I have given you code to generate data and fit 4 different models to the data. You should run through the code line by line in the console.
Discuss the questions with your neighbors. Write down answers.

Generate data and fit models

generate.data = function(n, p=3){
  X = 5 + matrix(rnorm(3*n), n)
  beta = c(runif(p+1, -1,1))
  epsilon = rnorm(n)
  Y = exp(beta[1] + X %*% beta[-1] + epsilon) ## NOTE THIS LINE!!
  data.frame(Y,X)
}
set.seed(20200213)
n = 250
dat = generate.data(n)
formulae = lapply(
  c('Y~.', 
    'log(Y)~.',
    paste0('Y ~', paste(paste0('log(X',1:3,')'),collapse='+')),
    paste0('log(Y) ~', paste(paste0('log(X',1:3,')'),collapse='+'))), 
  as.formula)
all.the.models = lapply(formulae, function(x) lm(x, data=dat))

Make QQ plots

## Base R version
#par(mfrow = c(2,2))
#for(i in 1:4){
#  qqnorm(residuals(all.the.models[[i]]))
#  qqline(residuals(all.the.models[[i]]))
#}
library(tidyverse)
resids = as_tibble(
  sapply(all.the.models, residuals), .name_repair = ~paste0("model",1:4))
resids %>% pivot_longer(everything()) %>%
  ggplot(aes(sample=value)) + geom_qq() + geom_qq_line() + 
  facet_wrap(~name, 2, scales = 'free_y')

Calculate CV

cv.lm = function(mdl) mean(residuals(mdl)^2 / (1-hatvalues(mdl))^2)
sapply(all.the.models, cv.lm)

Questions to answer

Which of the 4 models is the correct one?
What do you notice in the Q-Q plots? Which ones look ok? Why?
Examine the hatvalues for the 4 different models. What do you notice?
Consider models 1 and 2. In these two cases, what is residuals(mdl) doing? Think about how the log transformation affects these two things.
Is it reasonable to compare the CV values for models 1 and 3 with those of models 2 and 4? Why or why not?
How should we decide which model to use? Note: This is a subtle issue without a correct answer in light of the previous question.

dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dajmcdon/ubc-stat406-labs
Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Generate data and fit models

Make QQ plots

Calculate CV

Questions to answer

R Package Documentation

Browse R Packages

We want your feedback!

dajmcdon/ubc-stat406-labs Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Generate data and fit models

Make QQ plots

Calculate CV

Questions to answer

R Package Documentation

Browse R Packages

We want your feedback!

dajmcdon/ubc-stat406-labs
Tutorials and labs for UBC Stat 406 in the 2020-2021 online year