dajmcdon
). Include your buddy in the author field if you are working together.Generate 250 observations from a linear model as follows: $x_i$ should uniform between -2 and 2, the $\epsilon_i$ should be normally distributed with mean zero and variance $x_i^2$, $y_i=3 + 2x_i + \epsilon_i$.
Plot $y$ against $x$.
library(tidyverse) library(cowplot) set.seed(12032020) n = 250 df = tibble( x = runif(n, -2, 2), eps = rnorm(n, 0, sqrt(x^2)), y = 3 + 2*x + eps) ggplot(df, aes(x,y)) + geom_point() + theme_cowplot()
Estimate the model using OLS and WLS (with appropriate weights). This is easy to do. Try ?lm
if you're lost.
Produce a plot that shows the original data and both estimated regression lines.
Produce confidence intervals for both methods. What do you notice?
ls = lm(y~x, data=df) wls = lm(y~x, data=df, weights = 1/x^2) df$ls = fitted(ls) df$wls = fitted(wls) ggplot(pivot_longer(select(df, x, ls, wls), -x), aes(x,value,color=name)) + geom_line(size=2) + scale_color_brewer(palette = 'Set1') + geom_point(data=df, aes(x,y), color='purple', size=.5) + theme_cowplot() confint(ls) confint(wls)
Load the 301gradedist
data set. This was downloaded from IU's grade distribution database.
Regress avg_grade
on instructor
and avg_student_gpa
using OLS without intercept. Perform the same regression using n_student
as the weights. Why is this an appropriate weighting? Again, I suggest you consult the documentation ?lm
.
s301 = read_csv("301gradedist.csv") %>% mutate(instructor = as.factor(instructor)) ls301 = lm(avg_grade ~ instructor + avg_student_gpa-1, data=s301) wls301 = lm(avg_grade ~ instructor + avg_student_gpa-1, data=s301, weights = n_students)
How do you interpret the coefficients on the instructor? Which instructor seems to be the best (in the sense of their students getting the highest grades)?
Make one plot that shows all the data and the regression line (from WLS) for each instructor.
s301$preds = predict(wls301) ggplot(s301, aes(avg_student_gpa, avg_grade, color=instructor)) + geom_point() + geom_line(aes(y=preds)) + scale_color_viridis_d() + theme_cowplot()
cis = confint(wls301) cis
conf301 = tibble( lower = cis[-nrow(cis),1], upper = cis[-nrow(cis),2], ests = coef(wls301)[-nrow(cis)], instructor = str_replace(names(ests), "instructor", "") ) %>% mutate(instructor = fct_reorder(instructor, ests)) ggplot(conf301, aes(instructor,ests,color=instructor)) + geom_segment(aes(xend=instructor,y=lower,yend=upper)) + geom_point(color='black') + coord_flip() + scale_color_viridis_d() + theme_cowplot() + theme(legend.position = 'none')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.