ivreg: Two-Stage Least-Squares Regression with Diagnostics
In ivreg: Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

knitr::opts_chunk$set(echo = TRUE, fig.height=4, fig.width=4)

Overview

The ivreg package provides a comprehensive implementation of instrumental variables regression using two-stage least-squares (2SLS) estimation. The standard regression functionality (parameter estimation, inference, robust covariances, predictions, etc.) is derived from and supersedes the ivreg() function in the AER package. Additionally, various regression diagnostics are supported, including hat values, deletion diagnostics such as studentized residuals and Cook's distances; graphical diagnostics such as component-plus-residual plots and added-variable plots; and effect plots with partial residuals.

In order to provide all of this functionality the ivreg package integrates seamlessly with other packages by providing suitable S3 methods, specifically for generic functions in the base-R stats package, and in the car, effects, lmtest, and sandwich packages, among others.

The package is accompanied by two online vignettes, namely this introduction and an article introducing the regression diagnostics and graphics:

Installation

The stable release version of ivreg is hosted on the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=ivreg and can be installed along with all dependencies via

install.packages("ivreg", dependencies = TRUE)

The development version of ivreg is hosted on GitHub at https://github.com/zeileis/ivreg/. It can be conveniently installed via the install_github() function in the remotes package:

remotes::install_github("https://github.com/zeileis/ivreg/")

Instrumental variables regression

The main function in the ivreg package is ivreg(), which is a high-level formula interface to the work-horse ivreg.fit() function; both functions return a list of quantities similar to that returned by lm() (including coefficients, coefficient variance-covariance matrix, residuals, etc.). In the case of ivreg(), the returned list is of class "ivreg", for which a wide range of standard methods is available, including print(), summary(), coef(), vcov(), anova(), predict(), residuals(), terms(), model.matrix(), formula(), update(), hatvalues(), dfbeta(), and rstudent(). Moreover, methods for functionality from other packages is provided, and is described in more detail in a companion vignette.

Regressors and instruments for ivreg() are most easily specified in a formula with two parts on the right-hand side, for example, y ~ x1 + x2 | x1 + z1 + z2, where x1 and x2 are, respectively, exogenous and endogenous explanatory variables, and x1, z1, and z2 are instrumental variables. Both components on the right-hand side of the model formula include an implied intercept, unless, as in a linear model estimated by lm(), the intercept is explicitly excluded via -1. Exogenous explanatory variables, such as x1 in the example, must be included among the instruments. A worked example is described immediately below. As listing exogenous variables in both parts on the right-hand side of the formula may become tedious if there are many of them, an additional convenience option is to use a three-part right side like y ~ x1 | x2 | z1 + z2, listing the exogenous, endogenous, and instrumental variables (for the endogenous variables only), respectively.

Illustration: Returns to schooling

As an initial demonstration of the ivreg package, we investigate the effect of schooling on earnings in a classical model for wage determination. The data are from the United States, and are provided in the package as SchoolingReturns. This data set was originally studied by David Card, and was subsequently employed, as here, to illustrate 2SLS estimation in introductory econometrics textbooks. The relevant variables for this illustration are:

data("SchoolingReturns", package = "ivreg")
summary(SchoolingReturns[, 1:8])

A standard wage equation uses a semi-logarithmic linear regression for wage, estimated by ordinary least squares (OLS), with years of education as the primary explanatory variable, adjusting for a quadratic term in labor-market experience, as well as for factors coding ethnicity, residence in a city (smsa), and residence in the U.S. south:

m_ols <- lm(log(wage) ~ education + poly(experience, 2) + ethnicity + smsa + south,
  data = SchoolingReturns)
summary(m_ols)

Thus, OLS estimation yields an estimate of r round(100 * coef(m_ols)["education"], digits = 1)% per year for returns to schooling. This estimate is problematic, however, because it can be argued that education is endogenous (and hence also experience, which is taken to be age minus education minus 6). We therefore use geographical proximity to a college when growing up as an exogenous instrument for education. Additionally, age is the natural exogenous instrument for experience, while the remaining explanatory variables can be considered exogenous and are thus used as instruments for themselves. Although it's a useful strategy to select an effective instrument or instruments for each endogenous explanatory variable, in 2SLS regression all of the instrumental variables are used to estimate all of the regression coefficients in the model.

To fit this model with ivreg() we can simply extend the formula from lm() above, adding a second part after the | separator to specify the instrumental variables:

library("ivreg")
m_iv <- ivreg(log(wage) ~ education + poly(experience, 2) + ethnicity + smsa + south |
  nearcollege + poly(age, 2) + ethnicity + smsa + south,
  data = SchoolingReturns)

Equivalently, the same model can also be specified slightly more concisely using three parts on the right-hand side indicating the exogenous variables, the endogenous variables, and the additional instrumental variables only (in addition to the exogenous variables).

m_iv <- ivreg(log(wage) ~ ethnicity + smsa + south | education + poly(experience, 2) |
  nearcollege + poly(age, 2), data = SchoolingReturns)

Both models yield the following results:

summary(m_iv)

Thus, using two-stage least squares to estimate the regression yields a much larger coefficient for the returns to schooling, namely r round(100 * coef(m_iv)["education"], digits = 1)% per year. Notice as well that the standard errors of the coefficients are larger for 2SLS estimation than for OLS, and that, partly as a consequence, evidence for the effects of ethnicity and the quadratic component of experience is now weak. These differences are brought out more clearly when showing coefficients and standard errors side by side, e.g., using the compareCoefs() function from the car package or the msummary() function from the modelsummary package:

library("modelsummary")
m_list <- list(OLS = m_ols, IV = m_iv)
msummary(m_list)

The change in coefficients and associated standard errors can also be brought out graphically using the modelplot() function from modelsummary which shows the coefficient estimates along with their 95% confidence intervals. Below we omit the intercept and experience terms as these are on a different scale than the other coefficients.

modelplot(m_list, coef_omit = "Intercept|experience")

Any scripts or data that you put into this service are public.

ivreg documentation built on April 3, 2025, 9:34 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ivreg
Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

ivreg: Two-Stage Least-Squares Regression with Diagnostics
In ivreg: Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

Overview

Installation

Instrumental variables regression

Illustration: Returns to schooling

Try the ivreg package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ivreg Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

ivreg: Two-Stage Least-Squares Regression with Diagnostics In ivreg: Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

Overview

Installation

Instrumental variables regression

Illustration: Returns to schooling

Try the ivreg package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ivreg
Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

ivreg: Two-Stage Least-Squares Regression with Diagnostics
In ivreg: Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics