Home

/

GitHub

/

In reuning/POL306: POL 306 Applied Research Methods at Miami

library(learnr)
library(testwhat)

knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
tutorial_options(exercise.checker = testwhat_learnr)

df <- POL306::state_cig

A Note on Data

We are still using the same data as we did in the plots lesson. Below is a reminder of what is in it.

Throughout this tutorial we will use a dataset that looks at healthcare spending, cigarette taxes, and a variety of political variables. The unit of analysis for it is the state-year, and the data comes from 2001 to 2014. The variables are:

cig_tax is the tax (in dollars) on a packet of cigarettes.
female_leg is the percentage of women in the state legislature.
liberal is the percentage of voters that identify as liberal.
dem_gov is an indicator that is 1 if the governor is a Democrat and 0 if they are not.
healthspendpc is the amount spent perperson on healthcare.
state is the name of the state
year is the year.

Bivariate Linear Regression

The most important function you will use for estimating linear regression models is lm(). This will expect a formula and will output something called a model object that you will want to save so you can then manipulate it.

For example, if you have a dataframe called data.frame and three variables named: var1 and var2. If you want to estimate a linear regression with var1 as the dependent variable and var2 as the independent variable you would write:

mod <- lm(var1 ~ var2, data=data.frame)

This won't output anything but you can look at the results using summary() like this:

summary(mod)

Instructions:

We will start by just estimating a linear regression of the cigarette tax (the DV) as a function of the percent of liberals in a state. The variables are all in the df dataset and are: cig_tax, and liberal.

You should save the linear model as mod (like in the example above) and then output the results using the summary() function.

names(df)

df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
summary(mod)

ex() %>% {
  check_function(., "lm" ) %>%
    check_arg("formula") %>% check_equal()
  check_function(., "summary")
}

Linear Regression Extended

The reason that we save the lm() object is that there are a lot of things we can do with it after we are done. Here are a few things:

coef(mod) will create a vector of just the coefficients, which can be useful if we want to do something to them.
mod$residuals is a vector of all the residuals for the regression. We could easily plot them with plot(mod$residuals) (we'll see why this is important later)
nobs(mod) calculates the number of observations in the final dataset.
mod$fitted.values is a vector of the predictions from the regression.

Instructions:

Using the mod we made in the last activity (I included the code here), do the following:

Get the number of the observations included in the model.
Plot the residuals.

names(df)

mod <- lm(cig_tax~liberal, data=df)
summary(mod)

df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
summary(mod)

nobs(mod)
plot(mod$residuals)

ex() %>% {
  check_function(., "plot")
  check_function(., "nobs" )
}

Plotting Residuals vs Predicted

We previously saw how to plot residuals using plot(mod$residuals). This plots them on the y-axis with the observation number on the x-axis. We also are interested in plotting our residuals as a function of the the predicted value. This can be accomplished by using mod$fitted.values as well as mod$residuals and the plot() function. Remember you want the y-axis to have the residuals (you can specify which variable goes on which axis using y= and x=) and the x-axis should have the predicted values.

Instructions:

Still using the same model, now look at the residuals compared to the predicted value. You will have to include both variables in the call to plot().

names(df)
mod <- lm(cig_tax~liberal, data=df)

df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
plot(y=mod$residuals, x=mod$fitted.values)

ex() %>% {
  check_function(., "plot") %>% {
    check_arg(., "x") %>% check_equal
    check_arg(., "y") %>% check_equal
  }

}

Plotting a Regression Line

You can add the regression line to a scatter plot by using the abline() function after you make a scatter plot. You just have to put the model object (mod) into the abline() function. I've recreated the regression you looked at before with a scatter plot. Now you have to add a regression line.

names(df)

mod <- lm(cig_tax~liberal, data=df)
plot(y=df$cig_tax, x=df$liberal)

df <- POL306::state_cig ### Ignore this

names(df)

mod <- lm(cig_tax~liberal, data=df)
plot(y=df$cig_tax, x=df$liberal)
abline(mod)

ex() %>% 
  check_function("abline")

reuning/POL306 documentation built on Sept. 19, 2023, 10:47 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

reuning/POL306
POL 306 Applied Research Methods at Miami

In reuning/POL306: POL 306 Applied Research Methods at Miami

A Note on Data

Bivariate Linear Regression

Linear Regression Extended

Plotting Residuals vs Predicted

Plotting a Regression Line

R Package Documentation

Browse R Packages

We want your feedback!

reuning/POL306 POL 306 Applied Research Methods at Miami

In reuning/POL306: POL 306 Applied Research Methods at Miami

A Note on Data

Bivariate Linear Regression

Linear Regression Extended

Plotting Residuals vs Predicted

Plotting a Regression Line

R Package Documentation

Browse R Packages

We want your feedback!

reuning/POL306
POL 306 Applied Research Methods at Miami