Instructions

  1. Rename this document with your student ID (not the 10-digit number, your IU username, e.g. dajmcdon). Include your buddy in the author field if you are working together.
  2. Follow the instructions in each section.
set.seed(000012345)
n = 100
p = 3
X = matrix(rnorm(n*(p+1)),nrow=n)
generate.data <- function(X, sig.epsilon=1){
  ## This function takes 3 inputs (2 mandatory, 1 optional)
  ## n - the number of observations
  ## p - the number of predictors
  ## sig.epsilon - (optional) the sd of the normal noise (default=1 if omitted)
  n = nrow(X)
  p = ncol(X)
  epsilon = rnorm(n, sd = sig.epsilon) ## noise ~ N(0, sig.epsilon)
  beta = c(p:2,0) # p beta coefficients
  beta.0 = 3 # an intercept
  y = beta.0 + X %*% beta + epsilon # the linear model
  df = data.frame(y, X) # put it all in a data frame
  return(df) # output
}

Getting the data

Use the function above to generate data from a linear model.

df = 

Estimate the regression

Estimate 2 linear models: (1) on all four predictors, (2) on the first 3 predictors. For each model, calculate the RSS. Find the difference between RSS(small)-RSS(large). Find the degrees of freedom for both models.


Perform the F test two ways

Write a function which calculates the F-statistic given two fitted, nested lm objects. [ F = \frac{(RSS_{small}-RSS_{large})/(df_{small}-df_{large})}{RSS_{large}/df_{large}} ] Calculate the statistic for your above models, and calculate the p-value using pf(Fstat,dfnum,dfdenom,lower.tail=FALSE). Does this F-stat and p-value match the output of the anova applied to the two models?

Fstat <- function(small, large){
  ## Takes in 2 fitted, nested lm objects


}

Simulate the F test

Now, follow the slides and use the code below. Simulate the test statistic 1000 times. Plot the simulated density, the true F-density, and the observed test statistic. Print both p-values.

simulate.from.lm <- function(df, mdl) { # altered to work with any lm output
  yhat <- predict(mdl) 
  newy <- yhat + rnorm(nrow(df), sd = summary(mdl)$sigma)
  df[[names(mdl$model)[1]]] <- newy # the names part, gets the response from the df
  return(df)
}

# Simulate from an estimated linear model, and refit both the large and small lm
# Inputs: data frame with covariates (df), fitted small lm (small), fitted large lm (large)
# Output: the Fstat
Fstat.sim <- function (df, small, large) {
  sim.df <- simulate.from.lm(df, small)
  small.sim <- lm(formula(small),data=sim.df) # used formulas instead
  large.sim = lm(formula(large), data=sim.df)
  return(Fstat(small.sim,large.sim))
}


dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.