Instructions

  1. Rename this document with your student ID (not the 10-digit number, your IU username, e.g. dajmcdon). Include your buddy if you are working together.
  2. Examine the function below called generate.data.
  3. Answer the question about the marked line by X.
  4. Fill in the missing bits in the intervals function.
  5. In part 3, call the function intervals for 25 different correlations. Each time, save the output.
  6. Repeat the above step 50 times. Find the average width for each correlation over the 50 replications.
  7. Plot the average width of the CI against cors.

Functions to examine and complete

library(ggplot2)
# This function generates Y, X1, and X2
## Inputs: n, the number of data points
##         cor, the correlation between X1 and X2
## Outputs: a data frame
generate.data <- function(n, cor){
  epsilon = rnorm(n)
  X = matrix(rnorm(2*n),nrow=n) %*% 
    chol(matrix(c(1, cor, cor,1), 2, 2))
  ## See the question below about X
  beta = c(3, 2, 1)
  Y = cbind(1, X) %*% beta + epsilon
  df = data.frame(Y, X)
  return(df)
}

## This function finds confidence interval widths 
## for the Linear model
## Inputs: n, the number of data points
##         cor, the correlation between X1 and X2
## Outputs: avg, the average width of the confidence 
##               intervals for the regression
intervals <- function(n, cor){
  df = generate.data(n, cor) 
  mdl = lm(Y~X1+X2, data=df)
  itvals = confint(mdl) ## Get the confidences intervals for the bhats (95%)
  widths = itvals %*% c(-1,1) ## Find the width of each interval
  avg = mean(widths[-1]) ## get the average width, ignore intercept
  return(avg)
}

n = 250
ncors = 25
cors = 1 / (1 + exp(-seq(-5,5,length.out = ncors)))*2 - 1

What is the the marked line in generate.data doing? Why did I multiply by some matrix?

The goal here is to create correlated predictors (covariates) X1 and X2. Just as I can generate $y\sim \mathrm{N}(0,\sigma^2)$ with code like

y = rnorm(1,0,1)
y = sigma*y

that is, multiplying a standard normal by the sd, I can get correlated variables by multiplying standard normals by the square root of the variance matrix that I want. In this case, I'm using the Cholesky decomposition to get the square root.

Running the simulation and plotting

## calculate the average width for each value of cors
## repeat the experiment 50 times
## 
## You could do this in a few loops, but it's much easier with sapply
avg.widths = replicate(50, sapply(cors, intervals, n=n))
avg.widths = rowMeans(avg.widths)

## plot cors (x-axis) vs. average widths (y-axis)
##    label your axes nicely
##    make the y-axis on the log scale
# plot(cors, avg.widths, las=1, log='y', xlab='correlation')
dat = data.frame(correlation=cors, avg.widths=avg.widths)
library(ggplot2)
ggplot(dat, aes(correlation, avg.widths)) + 
  geom_path(color="orange",size=2) +
  geom_point(color="purple", size=2) + 
  scale_y_log10() + theme_minimal()

How does correlation affect confidence intervals?

Larger magnitude correlation leads to wider confidence intervals.



dajmcdon/ubc-stat406-labs documentation built on Aug. 18, 2020, 1:23 p.m.