Instructions

  1. Rename this document with your student ID (not the 10-digit number, your IU username, e.g. dajmcdon). Include your buddy in the author field if you are working together.
  2. I have given you code to generate data and fit 4 different models to the data. You should run through the code line by line in the console.
  3. Discuss the questions with your neighbors. Write down answers.

Generate data and fit models

generate.data = function(n, p=3){
  X = 5 + matrix(rnorm(3*n), n)
  beta = c(runif(p+1, -1,1))
  epsilon = rnorm(n)
  Y = exp(beta[1] + X %*% beta[-1] + epsilon) ## NOTE THIS LINE!!
  data.frame(Y,X)
}
set.seed(20200213)
n = 250
dat = generate.data(n)
formulae = lapply(
  c('Y~.', 
    'log(Y)~.',
    paste0('Y ~', paste(paste0('log(X',1:3,')'),collapse='+')),
    paste0('log(Y) ~', paste(paste0('log(X',1:3,')'),collapse='+'))), 
  as.formula)
all.the.models = lapply(formulae, function(x) lm(x, data=dat))

Make QQ plots

## Base R version
#par(mfrow = c(2,2))
#for(i in 1:4){
#  qqnorm(residuals(all.the.models[[i]]))
#  qqline(residuals(all.the.models[[i]]))
#}
library(tidyverse)
resids = as_tibble(
  sapply(all.the.models, residuals), .name_repair = ~paste0("model",1:4))
resids %>% pivot_longer(everything()) %>%
  ggplot(aes(sample=value)) + geom_qq() + geom_qq_line() + 
  facet_wrap(~name, 2, scales = 'free_y')

Calculate CV

cv.lm = function(mdl) mean(residuals(mdl)^2 / (1-hatvalues(mdl))^2)
sapply(all.the.models, cv.lm)

Questions to answer

  1. Which of the 4 models is the correct one?

  2. What do you notice in the Q-Q plots? Which ones look ok? Why?

  3. Examine the hatvalues for the 4 different models. What do you notice?

  4. Consider models 1 and 2. In these two cases, what is residuals(mdl) doing? Think about how the log transformation affects these two things.

  5. Is it reasonable to compare the CV values for models 1 and 3 with those of models 2 and 4? Why or why not?

  6. How should we decide which model to use? Note: This is a subtle issue without a correct answer in light of the previous question.



dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.