dajmcdon
). Include your buddy if you are working together.generate.data
.X
.intervals
function.intervals
for 25 different correlations. Each time, save the output.cors
.library(ggplot2) # This function generates Y, X1, and X2 ## Inputs: n, the number of data points ## cor, the correlation between X1 and X2 ## Outputs: a data frame generate.data <- function(n, cor){ epsilon = rnorm(n) X = matrix(rnorm(2*n),nrow=n) %*% chol(matrix(c(1, cor, cor,1), 2, 2)) ## See the question below about X beta = c(3, 2, 1) Y = cbind(1, X) %*% beta + epsilon df = data.frame(Y, X) return(df) } ## This function finds confidence interval widths ## for the Linear model ## Inputs: n, the number of data points ## cor, the correlation between X1 and X2 ## Outputs: avg, the average width of the confidence ## intervals for the regression intervals <- function(n, cor){ df = generate.data(n, cor) mdl = lm(Y~X1+X2, data=df) itvals = confint(mdl) ## Get the confidences intervals for the bhats (95%) widths = itvals %*% c(-1,1) ## Find the width of each interval avg = mean(widths[-1]) ## get the average width, ignore intercept return(avg) } n = 250 ncors = 25 cors = 1 / (1 + exp(-seq(-5,5,length.out = ncors)))*2 - 1
generate.data
doing? Why did I multiply by some matrix?The goal here is to create correlated predictors (covariates) X1
and X2
.
Just as I can generate $y\sim \mathrm{N}(0,\sigma^2)$ with code like
y = rnorm(1,0,1) y = sigma*y
that is, multiplying a standard normal by the sd, I can get correlated variables by multiplying standard normals by the square root of the variance matrix that I want. In this case, I'm using the Cholesky decomposition to get the square root.
## calculate the average width for each value of cors ## repeat the experiment 50 times ## ## You could do this in a few loops, but it's much easier with sapply avg.widths = replicate(50, sapply(cors, intervals, n=n)) avg.widths = rowMeans(avg.widths) ## plot cors (x-axis) vs. average widths (y-axis) ## label your axes nicely ## make the y-axis on the log scale # plot(cors, avg.widths, las=1, log='y', xlab='correlation') dat = data.frame(correlation=cors, avg.widths=avg.widths) library(ggplot2) ggplot(dat, aes(correlation, avg.widths)) + geom_path(color="orange",size=2) + geom_point(color="purple", size=2) + scale_y_log10() + theme_minimal()
Larger magnitude correlation leads to wider confidence intervals.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.