## Do not delete this! ## It loads the s20x library for you. If you delete it ## your document may not compile it. require(s20x) knitr::opts_chunk$set( dev = "png", fig.ext = "png", dpi = 96 )
We wish to determine if regular attendance in class is associated with exam mark. In particular, we want to estimate the expected exam marks of attendees and non-attendees, and predict the actual exam scores of individual attendees and non-attendees.
The variables of interest are:
Exam: Exam mark out of 100.Attend: A two-level categorical variable which has the levels Yes and No.We want to quantify the relationship between exam marks and attendance. In particular, we want to estimate the expected exam marks of attendees and non-attendees, and predict the actual exam scores of individual attendees and non-attendees.
load(system.file("extdata", "Stats20x.df.rda", package = "s20x"))
Stats20x.df = read.table("STATS20x.txt", header = T) # Could of also used plot(Exam ~ Attend, data = Stats20x.df) boxplot(Exam ~ Attend, data = Stats20x.df)
boxplot(Exam ~ Attend, data = Stats20x.df)
The exam marks for the students attended seem to be centred than the students who did not attend lectures regularly. The boxplots look reasonably symmetric in both groups.
examAttend.fit = lm(Exam ~ Attend, data = Stats20x.df) eovcheck(examAttend.fit) normcheck(examAttend.fit) cooks20x(examAttend.fit) summary(examAttend.fit) confint(examAttend.fit)
predAttend.df = data.frame(Attend = c("No", "Yes")) # Using `interval = "confidence"' to get CI for expected exam score # for each level of Attend predict(examAttend.fit, predAttend.df, interval = "confidence") # Using `interval = "predition"' to get PI for individual student's exam score # for each level of Attend predict(examAttend.fit, predAttend.df, interval = "prediction")
conf = as.data.frame(predict(examAttend.fit, predAttend.df, interval = "confidence")) resultStr1 = paste0(sprintf("%.1f", conf$lwr), " to ", sprintf("%.1f", conf$upr)) pre = as.data.frame(predict(examAttend.fit, predAttend.df, interval = "prediction")) resultStr2 = paste0(sprintf("%.1f", pre$lwr), " to ", sprintf("%.1f", pre$upr))
We wish to explain exam marks with attendance, a two-level factor. So, we have fitted a linear model with a single explanatory dummy variable. (Note, this is equivalent to conducting a two-sample t-test).
Four non-attending students did unusually well (i.e., large positive residuals), but since the sample size was large, this will be of little consequence. Hence, all model assumptions were satisfied.
Our final model is $$Exam_i=\beta_0 +\beta_1 \times Attend.Yes_i+\epsilon_i,$$ where $\epsilon_i \sim iid ~ N(0,\sigma^2)$. Here $Attend.Yes_i=1$ if the student regularly attended (i.e., answered "Yes"), otherwise it is zero (i.e. "No").
Our model explained a small 15% of the variability in the students' final exam marks.
We wanted to quantify the relationship between exam marks and attendance.
There was strong evidence that exam marks were higher for students who attend class versus students who didn't attend class (P-value $\approx 10^{-6}$). We estimate that regular attendance could increase their expected exam mark between 9.5 to 21.6 exam marks (out of 100).
The expected exam marks of non-attendees and attendees are between r resultStr1[1], and r resultStr1[2], respectively.
The predicted exam marks for individual non-attendees and attendees are between r resultStr2[1], and r resultStr2[2], respectively.
Our model only explains 15% of the variability in the students' final exam marks, this would not be very good for prediction. We can see this in how wide our prediction intervals are.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.