library(learnr) library(umdpsyc) library(gradethis) library(ggplot2) knitr::opts_chunk$set(exercise.checker = gradethis::grade_learnr) knitr::opts_chunk$set(echo = FALSE) learnr::tutorial_options( exercise.timelimit = 60)
One of the most common uses for R is investigating relationships between numerical variables and running regression models. The way that R handles running models can be a bit different from other statistical software might do it. In R, we create a model object, which contains the executed model, then use the summary
function to access the relevant information from the model we just ran. This means it typically takes a line of code to run a model and then a separate line of code to see the results, so make sure to pay close attention to how we get the information we need.
In this tutorial, we will look at the relationship between student motivation (on a 1 to 10 scale) and exam scores, as well as the relationship between the amount of sleep that a student got in the week before the exam and the exam score. You can find papers on these topics here for motivation and exam scores and here for sleep and exam scores.
As a reminder, here is what the data looks like.
data(exam_scores) head(exam_scores)
Let's start by looking at some scatterplots to look at relationships visually.
qplot(motivation, exam1, data = exam_scores)
qplot(sleep_week, exam1, data = exam_scores)
You can use the same function to find both Spearman's Rho rank correlation and Pearson's correlation. The cor
function is used for both, and you simply specify the method as an argument in the function. The first two arguments are the variables you want to find the correlation for, while the third is the type of correlation that you want to find.
Spearman's Rho
cor(exam_scores$motivation,exam_scores$exam1,method = 'spearman')
Pearson Correlation
cor(exam_scores$sleep_week,exam_scores$exam1,method = 'pearson')
We'll start by looking at the relationship between exam scores and the amount of sleep a student got in the week before the exam.
Let's fit a least squares regression line to this using the lm
function. To do this, you'll have to use formula notation for the first argument. The formula notation is simply the form:
$$[\text{outcome}] \sim [\text{predictor}]$$
so you specify the outcome variable, then put in a ~
, then put in your predictor variable. The last argument is the data
argument, specifying the data frame so that we can use the variable names by themselves within the formula.
The lm
function outputs an lm
object. In order to get the results of running the model that we want, such as the coefficients, the associated p-values, R-squared values, and so on, we need to use summary
on the model object.
exammod <- lm(exam1 ~ sleep_week, data = exam_scores) summary(exammod)
To get regression diagnostic plots, we can simply use plot
on the model object. The first two plots show the residuals. The first one is the residual plot, which you should check to make sure there is no pattern and that there is constant variance. The second is a QQ-plot checking the normality of the residuals. The next two are used to check for outliers and influential points.
plot(exammod)
In order to do multiple regression, we can simply add additional variables to the righthand side of the tilde (~
). We can do this using +
and *
. If you use +
, this simply adds another variable, whereas using *
include all interaction terms for variables included in the *
operation.
Let's take a look at model to predict exam scores using motivation and sleep, including an interaction term.
exammod2 <- lm(exam1 ~ motivation * sleep_week, data = exam_scores) summary(exammod2)
How would you find a linear regression model using scores on the first exam as the outcome and student motivation as the predictor? Output the summary of the model (using summary
).
grade_result( pass_if(~ identical(.result, summary(lm(exam1 ~ motivation, data = exam_scores))), "Good Job!"), fail_if(~ TRUE) )
learnr::question("Based on the output of the model from Exercise 1, what is the coefficient for motivation (the slope of the regression line)?", answer("1.7005", correct = TRUE, message = "Good job!"), answer("71.6990", correct = FALSE), answer("0.000797", correct = FALSE), answer("3.474", correct = FALSE), allow_retry = TRUE, random_answer_order = TRUE )
How would you find a linear regression model using scores on the first exam as the outcome and two predictor variables: minutes of sleep the week before and student motivation, including an interaction term? Output the summary of the model (using summary
).
grade_result( pass_if(~ identical(.result, summary(lm(exam1 ~ motivation*sleep_week, data = exam_scores))), "Good Job!"), fail_if(~ TRUE) )
submission_code <- function(UID){ httr::sha1_hash('regression121',as.character(UID)) }
Generate your submission code by putting in your UID in the function below. For example, if your UID is 2
, then your code should look like submission_code(UID = 2)
# Replace the number below with your UID submission_code(2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.