dajmcdon
). Include your buddy in the author field if you are working together.set.seed(000012345) n = 100 p = 3 X = matrix(rnorm(n*(p+1)),nrow=n) generate.data <- function(X, sig.epsilon=1){ ## This function takes 3 inputs (2 mandatory, 1 optional) ## n - the number of observations ## p - the number of predictors ## sig.epsilon - (optional) the sd of the normal noise (default=1 if omitted) n = nrow(X) p = ncol(X) epsilon = rnorm(n, sd = sig.epsilon) ## noise ~ N(0, sig.epsilon) beta = c(p:2,0) # p beta coefficients beta.0 = 3 # an intercept y = beta.0 + X %*% beta + epsilon # the linear model df = data.frame(y, X) # put it all in a data frame return(df) # output }
Use the function above to generate data from a linear model.
df =
Estimate 2 linear models: (1) on all four predictors, (2) on the first 3 predictors. For each model, calculate the RSS. Find the difference between RSS(small)-RSS(large). Find the degrees of freedom for both models.
Write a function which calculates the F-statistic given two fitted, nested lm
objects.
[
F = \frac{(RSS_{small}-RSS_{large})/(df_{small}-df_{large})}{RSS_{large}/df_{large}}
]
Calculate the statistic for your above models, and calculate the p-value using pf(Fstat,dfnum,dfdenom,lower.tail=FALSE)
.
Does this F-stat and p-value match the output of the anova
applied to the two models?
Fstat <- function(small, large){ ## Takes in 2 fitted, nested lm objects }
Now, follow the slides and use the code below. Simulate the test statistic 1000 times. Plot the simulated density, the true F-density, and the observed test statistic. Print both p-values.
simulate.from.lm <- function(df, mdl) { # altered to work with any lm output yhat <- predict(mdl) newy <- yhat + rnorm(nrow(df), sd = summary(mdl)$sigma) df[[names(mdl$model)[1]]] <- newy # the names part, gets the response from the df return(df) } # Simulate from an estimated linear model, and refit both the large and small lm # Inputs: data frame with covariates (df), fitted small lm (small), fitted large lm (large) # Output: the Fstat Fstat.sim <- function (df, small, large) { sim.df <- simulate.from.lm(df, small) small.sim <- lm(formula(small),data=sim.df) # used formulas instead large.sim = lm(formula(large), data=sim.df) return(Fstat(small.sim,large.sim)) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.