This package prodives a function RobustRegression
that allows users to run linear regression with heteroskedastic robust standard errors, as well as clustered standard errors. The function returns a list object with a variety of useful outputs.
Let's start by running a simple linear regression using the iris
dataset included with R.
library(RobustRegression) library(stargazer) data("iris") iris <- as_data_frame(iris) head(iris)
Iris contains information on flower traits for a number of different species of iris. Suppose we want to run a regression. We want to run a regression predicting the sepal length (Sepal.Length
) as a function of sepal width (Sepal.Width
), petal length (Petal.Length
), and the species of iris (Species
). Note that Species
variable is categorical.
We are worried both that errors might be heterskedastic and clustered. Perhaps for example the variability in sepal length in creases with petal length (heteroskedasticity), or that each species of iris have similar and different types of error structures (clustering). Ordinarily, we might run a simple linear regression of the form lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris)
. However, we want to account for this heteroskedasticity and clustering.
In STATA, we could accomplish this through
regress Sepal.Length Sepal.Width Petal.Length i.Species, robust cluster(Species)
To do this in R, you'll need two libraries
library(lmtest) library(sandwich)
We can now run our regression
reg <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris) summary(reg)
These are our standard std. errors.
To get our robust standard errors, we do
lmtest::coeftest(reg, sandwich::vcovHC(reg, "HC1"))
This is saying extract the coefficients, using a variance-covariance matrix with "HC1" corrected standard errors. And voila, we have robust standard errors.
I've gone through and added this and some other functionality into a package called RobustRegression
regression <- RobustRegression(model = lm(Sepal.Length ~ Sepal.Width + Petal.Length + Species, data = iris), dat = iris, cluster_var = 'Species')
We simply specify a model
in the same manner as we normally would. RobustRegression
automatically calcualtes robust standard errors and p values. The default value for cluster_var
is 'None'. In this case, we want to cluster errors by "Species"
.
regression
is a list that contains a number of useful data frames and objects.
regression$
model
is the standard model object produced by lm
, except that an object called model$RobustSEs
has been added that include the robust standard errors. Note that functions such as summary
calculate things like p values and standard errors on the fly from the data, and so will not show robust standard errors if usedTidyModel
is a tidyed data frame with the model, containing the coefficients, standard and robust standard errors, and various test statisticsAugModel
is the original data frame used in the regression, with augmented variables such as fitted valuesGlanceModel
contains a tidyed data frame of model summary statistics, e.g. R^2^RegPlot
a plot of the regression coefficientsWe can use the stargazer
package to make nice table summaries of our results, with and without heteroskedastic robust and clustered standard errors.
stargazer(regression$model, type = 'html', se = list(regression$TidyModel$RobustSE), p = list(regression$TidyModel$RobustPvalue), t.auto = T, p.auto = T, title = 'Robust and Clustered Standard Errors') stargazer(regression$model, type = 'html', title = 'Basic Standard Errors')
As we would expect, using RobustRegression
doesn't change our estimated model coefficients at all, but accounting for clusters and heteroskedasticity does substantially change the standard errors and p values
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.