library(learnr) library(testwhat) knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) tutorial_options(exercise.checker = testwhat_learnr) df <- POL306::state_cig
We are still using the same data as we did in the plots lesson. Below is a reminder of what is in it.
Throughout this tutorial we will use a dataset that looks at healthcare spending, cigarette taxes, and a variety of political variables. The unit of analysis for it is the state-year, and the data comes from 2001 to 2014. The variables are:
cig_tax
is the tax (in dollars) on a packet of cigarettes. female_leg
is the percentage of women in the state legislature.liberal
is the percentage of voters that identify as liberal.dem_gov
is an indicator that is 1 if the governor is a Democrat and 0 if they are not.healthspendpc
is the amount spent perperson on healthcare. state
is the name of the stateyear
is the year. The most important function you will use for estimating linear regression models is lm()
. This will expect a formula and will output something called a model object that you will want to save so you can then manipulate it.
For example, if you have a dataframe called data.frame
and three variables named: var1
and var2
. If you want to estimate a linear regression with var1
as the dependent variable and var2
as the independent variable you would write:
mod <- lm(var1 ~ var2, data=data.frame)
This won't output anything but you can look at the results using summary()
like this:
summary(mod)
Instructions:
We will start by just estimating a linear regression of the cigarette tax (the DV) as a function of the percent of liberals in a state. The variables are all in the df
dataset and are: cig_tax
, and liberal
.
You should save the linear model as mod
(like in the example above) and then output the results using the summary()
function.
names(df)
df <- POL306::state_cig ### Ignore this names(df) mod <- lm(cig_tax~liberal, data=df) summary(mod)
ex() %>% { check_function(., "lm" ) %>% check_arg("formula") %>% check_equal() check_function(., "summary") }
The reason that we save the lm()
object is that there are a lot of things we can do with it after we are done. Here are a few things:
coef(mod)
will create a vector of just the coefficients, which can be useful if we want to do something to them. mod$residuals
is a vector of all the residuals for the regression. We could easily plot them with plot(mod$residuals)
(we'll see why this is important later)nobs(mod)
calculates the number of observations in the final dataset.mod$fitted.values
is a vector of the predictions from the regression.Instructions:
Using the mod
we made in the last activity (I included the code here), do the following:
names(df) mod <- lm(cig_tax~liberal, data=df) summary(mod)
df <- POL306::state_cig ### Ignore this names(df) mod <- lm(cig_tax~liberal, data=df) summary(mod) nobs(mod) plot(mod$residuals)
ex() %>% { check_function(., "plot") check_function(., "nobs" ) }
We previously saw how to plot residuals using plot(mod$residuals)
. This plots them on the y-axis with the observation number on the x-axis. We also are interested in plotting our residuals as a function of the the predicted value. This can be accomplished by using mod$fitted.values
as well as mod$residuals
and the plot()
function. Remember you want the y-axis to have the residuals (you can specify which variable goes on which axis using y=
and x=
) and the x-axis should have the predicted values.
Instructions:
Still using the same model, now look at the residuals compared to the predicted value. You will have to include both variables in the call to plot()
.
names(df) mod <- lm(cig_tax~liberal, data=df)
df <- POL306::state_cig ### Ignore this names(df) mod <- lm(cig_tax~liberal, data=df) plot(y=mod$residuals, x=mod$fitted.values)
ex() %>% { check_function(., "plot") %>% { check_arg(., "x") %>% check_equal check_arg(., "y") %>% check_equal } }
You can add the regression line to a scatter plot by using the abline()
function after you make a scatter plot. You just have to put the model object (mod
) into the abline()
function. I've recreated the regression you looked at before with a scatter plot. Now you have to add a regression line.
names(df) mod <- lm(cig_tax~liberal, data=df) plot(y=df$cig_tax, x=df$liberal)
df <- POL306::state_cig ### Ignore this names(df) mod <- lm(cig_tax~liberal, data=df) plot(y=df$cig_tax, x=df$liberal) abline(mod)
ex() %>% check_function("abline")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.