In DanOvando/RobustRegression: Run Robust and Clustered Regressions in R

This package prodives a function RobustRegression that allows users to run linear regression with heteroskedastic robust standard errors, as well as clustered standard errors. The function returns a list object with a variety of useful outputs.

Let's start by running a simple linear regression using the iris dataset included with R.

library(RobustRegression)
library(stargazer)

data("iris")

iris <- as_data_frame(iris)

head(iris)

Iris contains information on flower traits for a number of different species of iris. Suppose we want to run a regression. We want to run a regression predicting the sepal length (Sepal.Length) as a function of sepal width (Sepal.Width), petal length (Petal.Length), and the species of iris (Species). Note that Species variable is categorical.

We are worried both that errors might be heterskedastic and clustered. Perhaps for example the variability in sepal length in creases with petal length (heteroskedasticity), or that each species of iris have similar and different types of error structures (clustering). Ordinarily, we might run a simple linear regression of the form lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris). However, we want to account for this heteroskedasticity and clustering.

In STATA, we could accomplish this through

regress Sepal.Length Sepal.Width Petal.Length i.Species, robust cluster(Species)

To do this in R, you'll need two libraries

library(lmtest)
library(sandwich)

We can now run our regression

reg <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris)

summary(reg)

These are our standard std. errors.

To get our robust standard errors, we do

lmtest::coeftest(reg,  sandwich::vcovHC(reg, "HC1"))

This is saying extract the coefficients, using a variance-covariance matrix with "HC1" corrected standard errors. And voila, we have robust standard errors.

I've gone through and added this and some other functionality into a package called RobustRegression

regression <-  RobustRegression(model = lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris),
                 dat = iris, cluster_var = 'Species')

We simply specify a model in the same manner as we normally would. RobustRegression automatically calcualtes robust standard errors and p values. The default value for cluster_var is 'None'. In this case, we want to cluster errors by "Species".

regression is a list that contains a number of useful data frames and objects.

regression$
- model is the standard model object produced by lm, except that an object called model$RobustSEs has been added that include the robust standard errors. Note that functions such as summary calculate things like p values and standard errors on the fly from the data, and so will not show robust standard errors if used
- TidyModel is a tidyed data frame with the model, containing the coefficients, standard and robust standard errors, and various test statistics
- AugModel is the original data frame used in the regression, with augmented variables such as fitted values
- GlanceModel contains a tidyed data frame of model summary statistics, e.g. R^2^
- RegPlot a plot of the regression coefficients

We can use the stargazer package to make nice table summaries of our results, with and without heteroskedastic robust and clustered standard errors.

stargazer(regression$model, type = 'html', se = list(regression$TidyModel$RobustSE), p = list(regression$TidyModel$RobustPvalue), t.auto = T, p.auto = T, title = 'Robust and Clustered Standard Errors')

stargazer(regression$model, type = 'html', 
          title = 'Basic Standard Errors')

As we would expect, using RobustRegression doesn't change our estimated model coefficients at all, but accounting for clusters and heteroskedasticity does substantially change the standard errors and p values

DanOvando/RobustRegression documentation built on May 6, 2019, 1:22 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com