In baruuum/jars: Just Another Robust Standard errors package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Just Another R Package to Calculate Robust Standard Errors for Linear Models

As the title suggests, this is Just Another Rrobust Standard Errors R package for linear models. The main goal of writing this package was for convenience. As robust standard errors calculations have become the norm in the social sciences, it seemed to be good to have a simply way to implement them.

The package will overwrite the summary.lm S3 method to add two optional arguments, robust and cluster. Hence, when the package is loaded, a warning message should appear that the method summary.lm method was "masked." All robust covariance matrix calculations are done in C++ using the Armadillo linear algebra library and sourced via the RcppArmadillo package. The code should be therefore faster than alternative packages.

Right now, the package implements only a small number of robust covariance matrix estimators. Also, the functions provided will work only on linear regression models (i.e., R objects of class lm). The package might be extended in the future to other models. But for now, this is all I've got.

Installation

The general usage of the package is quite simple. First, the package needs to be installed and attached to the search path:

devtools::install_github("baruuum/jars")
library("jars")

library("jars")

Thereafter, heteroskedasticity and cluster-robust standard errors can be calculated by specifying the options robust and cluster, respectively.

Example: Heteroskedasticity-robust Covariance Matrix

To show how the package might be put to work, let us first simulate some data.

set.seed(15232)

# obs
n = 5000

# predictors
x1 = runif(n)
x2 = rnorm(n)

# clusters
z = sample(1:6, n, replace = T)

# outcome
y = 2 + runif(1, -1, 1) * x1 + runif(1, -1, 1) * x2 + rnorm(n)

# fit linear model
m = lm(y ~ x1 + x2)

After this, we can calculate heteroskedasticity robust standard errors by using the summary function:

summary(m, robust = TRUE)

To see the stars, we might change the default star = FALSE option to TRUE in the print function:

mysum = summary(m, robust = TRUE)
print(mysum, stars = T)

By default, the robust = TRUE option will calculate the HC1 type of robust standard errors. Right now, the package supports HC0 to HC4. The different estimators differ only in their degrees of freedom correction to correct for finite-sample bias. To estimate a different type of robust standard errors, the type option might be used. For example, to obtain HC3-type robust standard errors, we just add the type = "HC3" option:

summary(m, robust = T, type = "HC3")

Example: Clustering-robust Covariance Matrix

The code to compute cluster-robust standard errors looks quite similar. The only difference is that we have to pass into the summary function an integer vector that contains information regarding the clustered units. We have simulated such a vector above (z), which we might use here:

summary(m, cluster = z, type = "HC1")

Again, by default, the HC1 type degree of freedom correction is used. The HC1 type is also the default in STATA, when the robust or vce(cluster) option is used. For the clustered standard errors, only HC0 and HC1 types are currently available.

A note of caution: It is desirable that the clustering variable has no missing values. In the current implementation, the cluster-robust standard errors will be calculated after the observations for which the clustering variable is missing are dropped. This leads to the problem that the sample from which the coefficients of the linear model are estimated and that from which the standard errors are estimated will differ. This behavior contrasts with that of the sandwich package, where the observations for which the clustering variable are NA are treated as their own cluster. Both the sandwich package as well as this package will throw a warning if any missing values on the clustering variable are found in the data.

Comparison with `sandwich` package

To use the sandwich package, we need to load the lmtest package as well. The functions included in the sandwich package will calculate robust covariance matrices, which are then used to adjust the estimates in regression models via the lmtest::coeftest function.

We might first compare whether the sandwich functions and the jars functions return the same estimates of different kinds of robust covariance matrices. It should be noted that the sandwich package implements a much larger variety of robust covariance matrix estimators.

library("lmtest")
library("sandwich")
library("magrittr")

# types of heteroskedasticity-robust estimators
het_type = paste0("HC", 0:4)

# check whether results are equal to sandwich package
for (h in het_type) {

    all.equal(
        vcovHC(m, type = h),
        vcov(summary(m, robust = T, type = h)),
        check.attributes = FALSE
    ) %>% print

}

# types of cluster-robust estimators
clust_type = paste0("HC", c(0, 1))
for (h in clust_type) {

    all.equal(
        vcovCL(m, cluster = z, type = h),
        vcov(summary(m, cluster = z, type = h)),
        check.attributes = FALSE
    ) %>% print

}

We can also look into the printed results from these packages:

# using lmtest::coeftest
coeftest(m, vcov. = vcovCL(m, cluster = z, type = "HC1"))

# using jars
summary(m, cluster = z, type = "HC1")

Benchmarking results

We might also compare the two packages in terms of computational speed. The main functions of the sandwich and jars package to calculate robust standard errors are, respectively, vcovHC and .robustSE.

# Heteroskedasticity-robust covariance matrix estimators

for (h in het_type) {

    cat(paste0("\nBenchmarking het-robust covariance matrix of type ", h,"\n"))
    microbenchmark::microbenchmark(
        sand = vcovHC(m, type = h),
        jars = .robustSE(model.matrix(m), resid(m), type = h),
        times = 100,
        unit = "relative"
    ) %>% print(digits = 3)

}


# Cluster-robust covariance matrix estimators
for (h in clust_type) {

    cat(paste0("\nBenchmarking cluster-robust covariance matrix of type ", h,"\n"))
    microbenchmark::microbenchmark(
        sand = vcovHC(m, cluster = z, type = h),
        jars = .clusterSE(model.matrix(m), resid(m), z, type = h),
        times = 100,
        unit = "relative"
    ) %>% print(digits = 3)
}