This package prodives a function RobustRegression that allows users to run linear regression with heteroskedastic robust standard errors, as well as clustered standard errors. The function returns a list object with a variety of useful outputs.

Let's start by running a simple linear regression using the iris dataset included with R.

library(RobustRegression)
library(stargazer)

data("iris")

iris <- as_data_frame(iris)

head(iris)

Iris contains information on flower traits for a number of different species of iris. Suppose we want to run a regression. We want to run a regression predicting the sepal length (Sepal.Length) as a function of sepal width (Sepal.Width), petal length (Petal.Length), and the species of iris (Species). Note that Species variable is categorical.

We are worried both that errors might be heterskedastic and clustered. Perhaps for example the variability in sepal length in creases with petal length (heteroskedasticity), or that each species of iris have similar and different types of error structures (clustering). Ordinarily, we might run a simple linear regression of the form lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris). However, we want to account for this heteroskedasticity and clustering.

In STATA, we could accomplish this through

regress Sepal.Length Sepal.Width Petal.Length i.Species, robust cluster(Species)

To do this in R, you'll need two libraries

library(lmtest)
library(sandwich)

We can now run our regression

reg <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris)

summary(reg)

These are our standard std. errors.

To get our robust standard errors, we do

lmtest::coeftest(reg,  sandwich::vcovHC(reg, "HC1"))

This is saying extract the coefficients, using a variance-covariance matrix with "HC1" corrected standard errors. And voila, we have robust standard errors.

I've gone through and added this and some other functionality into a package called RobustRegression

regression <-  RobustRegression(model = lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris),
                 dat = iris, cluster_var = 'Species')

We simply specify a model in the same manner as we normally would. RobustRegression automatically calcualtes robust standard errors and p values. The default value for cluster_var is 'None'. In this case, we want to cluster errors by "Species".

regression is a list that contains a number of useful data frames and objects.

We can use the stargazer package to make nice table summaries of our results, with and without heteroskedastic robust and clustered standard errors.

stargazer(regression$model, type = 'html', se = list(regression$TidyModel$RobustSE), p = list(regression$TidyModel$RobustPvalue), t.auto = T, p.auto = T, title = 'Robust and Clustered Standard Errors')

stargazer(regression$model, type = 'html', 
          title = 'Basic Standard Errors')

As we would expect, using RobustRegression doesn't change our estimated model coefficients at all, but accounting for clusters and heteroskedasticity does substantially change the standard errors and p values



DanOvando/RobustRegression documentation built on May 6, 2019, 1:22 p.m.