In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

In HW1 you wrote the following function to generate data from a linear model (see my solutions):

generate.data <- function(n, p, sig.epsilon=1){
  ## This function takes 3 inputs (2 mandatory, 1 optional)
  ## n - the number of observations
  ## p - the number of predictors
  ## sig.epsilon - (optional) the sd of the normal noise (default=1 if omitted)
  X = matrix(rnorm(p*n), ncol=p) ## a matrix of standard normal RVs (n x p)
  epsilon = rnorm(n, sd = sig.epsilon) ## noise ~ N(0, sig.epsilon)
  beta = p:1 # p beta coefficients
  beta.0 = 3 # an intercept
  y = beta.0 + X %*% beta + epsilon # the linear model
  df = data.frame(y, X) # put it all in a data frame
  return(df) # output
}

You should complete the code below to see what happens to $R^2$ as you add additional irrelevant predictors. Note that some lines have comments requiring you to add code. Others ask you questions which you should answer in the comments.

Complete the following

library(tidyverse)
p = 3 # the number of true predictors
n = 100 # the total number of observations
pmax = 10 # the maximum number of predictors to examine, feel free to change this
stopifnot(pmax<n, pmax>p) # what does this do?
true.model = # add code here
extra.predictors = data.frame(matrix(rnorm(n*(pmax-p)), nrow=n)) 
names(extra.predictors) = paste0('X',(p+1):pmax) # why am I naming the extra predictors? What were their names before
full.data = bind_cols(true.model,extra.predictors)
rsq = double(pmax) # What do we call this step? We discussed this last week.
for(iter in 1:pmax){
  form = formula(paste('y ~ ', paste(names(full.data)[2:(iter+1)], collapse=' + '))) # annoying
  fit.mdl = lm(form, data=full.data)
  rsq[iter] = summary(fit.mdl)$r.sq # note how easy it is to get this
}
ggplot(data.frame(nvars=1:pmax, rsq),aes(x=nvars,y=rsq)) + 
  geom_path() + geom_point(color='red') + theme_minimal(base_family = 'serif')

Questions to answer:

Does $R^2$ decrease?
Why or why not?

dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dajmcdon/ubc-stat406-labs
Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Complete the following

Questions to answer:

R Package Documentation

Browse R Packages

We want your feedback!

dajmcdon/ubc-stat406-labs Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

In dajmcdon/ubc-stat406-labs: Tutorials and labs for UBC Stat 406 in the 2020-2021 online year

Instructions

Complete the following

Questions to answer:

R Package Documentation

Browse R Packages

We want your feedback!

dajmcdon/ubc-stat406-labs
Tutorials and labs for UBC Stat 406 in the 2020-2021 online year