library(learnr)
library(testwhat)

knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
tutorial_options(exercise.checker = testwhat_learnr)

df <- POL306::state_cig

A Note on Data

We are still using the same data as we did in the plots lesson. Below is a reminder of what is in it.

Throughout this tutorial we will use a dataset that looks at healthcare spending, cigarette taxes, and a variety of political variables. The unit of analysis for it is the state-year, and the data comes from 2001 to 2014. The variables are:

Bivariate Linear Regression

The most important function you will use for estimating linear regression models is lm(). This will expect a formula and will output something called a model object that you will want to save so you can then manipulate it.

For example, if you have a dataframe called data.frame and three variables named: var1 and var2. If you want to estimate a linear regression with var1 as the dependent variable and var2 as the independent variable you would write:

mod <- lm(var1 ~ var2, data=data.frame)

This won't output anything but you can look at the results using summary() like this:

summary(mod)

Instructions:

We will start by just estimating a linear regression of the cigarette tax (the DV) as a function of the percent of liberals in a state. The variables are all in the df dataset and are: cig_tax, and liberal.

You should save the linear model as mod (like in the example above) and then output the results using the summary() function.

names(df) 
df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
summary(mod)
ex() %>% {
  check_function(., "lm" ) %>%
    check_arg("formula") %>% check_equal()
  check_function(., "summary")
}

Linear Regression Extended

The reason that we save the lm() object is that there are a lot of things we can do with it after we are done. Here are a few things:

Instructions:

Using the mod we made in the last activity (I included the code here), do the following:

  1. Get the number of the observations included in the model.
  2. Plot the residuals.
names(df)

mod <- lm(cig_tax~liberal, data=df)
summary(mod)
df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
summary(mod)

nobs(mod)
plot(mod$residuals)
ex() %>% {
  check_function(., "plot")
  check_function(., "nobs" )
}

Plotting Residuals vs Predicted

We previously saw how to plot residuals using plot(mod$residuals). This plots them on the y-axis with the observation number on the x-axis. We also are interested in plotting our residuals as a function of the the predicted value. This can be accomplished by using mod$fitted.values as well as mod$residuals and the plot() function. Remember you want the y-axis to have the residuals (you can specify which variable goes on which axis using y= and x=) and the x-axis should have the predicted values.

Instructions:

Still using the same model, now look at the residuals compared to the predicted value. You will have to include both variables in the call to plot().

names(df)
mod <- lm(cig_tax~liberal, data=df)
df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
plot(y=mod$residuals, x=mod$fitted.values)
ex() %>% {
  check_function(., "plot") %>% {
    check_arg(., "x") %>% check_equal
    check_arg(., "y") %>% check_equal
  }

}

Plotting a Regression Line

You can add the regression line to a scatter plot by using the abline() function after you make a scatter plot. You just have to put the model object (mod) into the abline() function. I've recreated the regression you looked at before with a scatter plot. Now you have to add a regression line.

names(df)

mod <- lm(cig_tax~liberal, data=df)
plot(y=df$cig_tax, x=df$liberal)
df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
plot(y=df$cig_tax, x=df$liberal)
abline(mod)
ex() %>% 
  check_function("abline")


reuning/POL306 documentation built on Sept. 19, 2023, 10:47 a.m.