README.md

Travis build
status AppVeyor build
status Coverage
status License: GPL
v3

Just Another R Package to Calculate Robust Standard Errors for Linear Models

As the title suggests, this is Just Another Rrobust Standard Errors R package for linear models. The main goal of writing this package was for convenience. As robust standard errors calculations have become the norm in the social sciences, it seemed to be good to have a simply way to implement them.

The package will overwrite the summary.lm S3 method to add two optional arguments, robust and cluster. Hence, when the package is loaded, a warning message should appear that the method summary.lm method was “masked.” All robust covariance matrix calculations are done in C++ using the Armadillo linear algebra library and sourced via the RcppArmadillo package. The code should be therefore faster than alternative packages.

Right now, the package implements only a small number of robust covariance matrix estimators. Also, the functions provided will work only on linear regression models (i.e., R objects of class lm). The package might be extended in the future to other models. But for now, this is all I’ve got.

Installation

The general usage of the package is quite simple. First, the package needs to be installed and attached to the search path:

devtools::install_github("baruuum/jars")
library("jars")
#> Registered S3 method overwritten by 'jars':
#>   method     from 
#>   summary.lm stats
#> 
#> jars version 0.0.1

Thereafter, heteroskedasticity and cluster-robust standard errors can be calculated by specifying the options robust and cluster, respectively.

Example: Heteroskedasticity-robust Covariance Matrix

To show how the package might be put to work, let us first simulate some data.

set.seed(15232)

# obs
n = 5000

# predictors
x1 = runif(n)
x2 = rnorm(n)

# clusters
z = sample(1:6, n, replace = T)

# outcome
y = 2 + runif(1, -1, 1) * x1 + runif(1, -1, 1) * x2 + rnorm(n)

# fit linear model
m = lm(y ~ x1 + x2)

After this, we can calculate heteroskedasticity robust standard errors by using the summary function:

summary(m, robust = TRUE)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2)
#> 
#> Robust covariance matrix of type HC1 is used
#> 
#> Coefficients:
#>             Estimate Std. Error
#> (Intercept)    1.967      0.028
#> x1             0.753      0.050
#> x2            -0.407      0.015
#> 
#> N = 5000,  K = 3,  RMSE = 1.01,  R-squared = 0.173

To see the stars, we might change the default star = FALSE option to TRUE in the print function:

mysum = summary(m, robust = TRUE)
print(mysum, stars = T)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2)
#> 
#> Robust covariance matrix of type HC1 is used
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   1.9666     0.0280    70.2   <2e-16 ***
#> x1            0.7530     0.0495    15.2   <2e-16 ***
#> x2           -0.4067     0.0146   -27.8   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> N = 5000,  K = 3,  RMSE = 1.01,  R-squared = 0.173

By default, the robust = TRUE option will calculate the HC1 type of robust standard errors. Right now, the package supports HC0 to HC4. The different estimators differ only in their degrees of freedom correction to correct for finite-sample bias. To estimate a different type of robust standard errors, the type option might be used. For example, to obtain HC3-type robust standard errors, we just add the type = "HC3" option:

summary(m, robust = T, type = "HC3")
#> 
#> Call:
#> lm(formula = y ~ x1 + x2)
#> 
#> Robust covariance matrix of type HC3 is used
#> 
#> Coefficients:
#>             Estimate Std. Error
#> (Intercept)    1.967      0.028
#> x1             0.753      0.050
#> x2            -0.407      0.015
#> 
#> N = 5000,  K = 3,  RMSE = 1.01,  R-squared = 0.173

Example: Clustering-robust Covariance Matrix

The code to compute cluster-robust standard errors looks quite similar. The only difference is that we have to pass into the summary function an integer vector that contains information regarding the clustered units. We have simulated such a vector above (z), which we might use here:

summary(m, cluster = z, type = "HC1")
#> 
#> Call:
#> lm(formula = y ~ x1 + x2)
#> 
#> Cluster-robust covariance matrix of type HC1 is used
#> (Number of clusters: 6)
#> 
#> Coefficients:
#>             Estimate Std. Error
#> (Intercept)    1.967      0.053
#> x1             0.753      0.075
#> x2            -0.407      0.021
#> 
#> N = 5000,  K = 3,  RMSE = 31.8,  R-squared = 0.173

Again, by default, the HC1 type degree of freedom correction is used. The HC1 type is also the default in STATA, when the robust or vce(cluster) option is used. For the clustered standard errors, only HC0 and HC1 types are currently available.

A note of caution: It is desirable that the clustering variable has no missing values. In the current implementation, the cluster-robust standard errors will be calculated after the observations for which the clustering variable is missing are dropped. This leads to the problem that the sample from which the coefficients of the linear model are estimated and that from which the standard errors are estimated will differ. This behavior contrasts with that of the sandwich package, where the observations for which the clustering variable are NA are treated as their own cluster. Both the sandwich package as well as this package will throw a warning if any missing values on the clustering variable are found in the data.

Comparison with sandwich package

To use the sandwich package, we need to load the lmtest package as well. The functions included in the sandwich package will calculate robust covariance matrices, which are then used to adjust the estimates in regression models via the lmtest::coeftest function.

We might first compare whether the sandwich functions and the jars functions return the same estimates of different kinds of robust covariance matrices. It should be noted that the sandwich package implements a much larger variety of robust covariance matrix estimators.

library("lmtest")
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library("sandwich")
library("magrittr")

# types of heteroskedasticity-robust estimators
het_type = paste0("HC", 0:4)

# check whether results are equal to sandwich package
for (h in het_type) {

    all.equal(
        vcovHC(m, type = h),
        vcov(summary(m, robust = T, type = h)),
        check.attributes = FALSE
    ) %>% print

}
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE

# types of cluster-robust estimators
clust_type = paste0("HC", c(0, 1))
for (h in clust_type) {

    all.equal(
        vcovCL(m, cluster = z, type = h),
        vcov(summary(m, cluster = z, type = h)),
        check.attributes = FALSE
    ) %>% print

}
#> [1] TRUE
#> [1] TRUE

We can also look into the printed results from these packages:

# using lmtest::coeftest
coeftest(m, vcov. = vcovCL(m, cluster = z, type = "HC1"))
#> 
#> t test of coefficients:
#> 
#>              Estimate Std. Error t value  Pr(>|t|)    
#> (Intercept)  1.966629   0.053023  37.090 < 2.2e-16 ***
#> x1           0.753044   0.075303  10.000 < 2.2e-16 ***
#> x2          -0.406680   0.021142 -19.235 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# using jars
summary(m, cluster = z, type = "HC1")
#> 
#> Call:
#> lm(formula = y ~ x1 + x2)
#> 
#> Cluster-robust covariance matrix of type HC1 is used
#> (Number of clusters: 6)
#> 
#> Coefficients:
#>             Estimate Std. Error
#> (Intercept)    1.967      0.053
#> x1             0.753      0.075
#> x2            -0.407      0.021
#> 
#> N = 5000,  K = 3,  RMSE = 31.8,  R-squared = 0.173

Benchmarking results

We might also compare the two packages in terms of computational speed. The main functions of the sandwich and jars package to calculate robust standard errors are, respectively, vcovHC and .robustSE.

# Heteroskedasticity-robust covariance matrix estimators

for (h in het_type) {

    cat(paste0("\nBenchmarking het-robust covariance matrix of type ", h,"\n"))
    microbenchmark::microbenchmark(
        sand = vcovHC(m, type = h),
        jars = .robustSE(model.matrix(m), resid(m), type = h),
        times = 100,
        unit = "relative"
    ) %>% print(digits = 3)

}
#> 
#> Benchmarking het-robust covariance matrix of type HC0
#> Unit: relative
#>  expr  min lq mean median   uq  max neval cld
#>  sand 30.3 28 29.4   29.6 33.9 11.9   100   b
#>  jars  1.0  1  1.0    1.0  1.0  1.0   100  a 
#> 
#> Benchmarking het-robust covariance matrix of type HC1
#> Unit: relative
#>  expr  min   lq mean median uq  max neval cld
#>  sand 31.5 29.2 21.4   28.8 26 5.81   100   b
#>  jars  1.0  1.0  1.0    1.0  1 1.00   100  a 
#> 
#> Benchmarking het-robust covariance matrix of type HC2
#> Unit: relative
#>  expr  min   lq mean median   uq max neval cld
#>  sand 8.33 8.41 8.69   9.09 8.84 7.7   100   b
#>  jars 1.00 1.00 1.00   1.00 1.00 1.0   100  a 
#> 
#> Benchmarking het-robust covariance matrix of type HC3
#> Unit: relative
#>  expr  min   lq mean median   uq  max neval cld
#>  sand 15.9 16.7 17.5   18.2 20.6 11.4   100   b
#>  jars  1.0  1.0  1.0    1.0  1.0  1.0   100  a 
#> 
#> Benchmarking het-robust covariance matrix of type HC4
#> Unit: relative
#>  expr  min   lq mean median   uq max neval cld
#>  sand 8.42 9.33 10.2   10.4 10.7  12   100   b
#>  jars 1.00 1.00  1.0    1.0  1.0   1   100  a


# Cluster-robust covariance matrix estimators
for (h in clust_type) {

    cat(paste0("\nBenchmarking cluster-robust covariance matrix of type ", h,"\n"))
    microbenchmark::microbenchmark(
        sand = vcovHC(m, cluster = z, type = h),
        jars = .clusterSE(model.matrix(m), resid(m), z, type = h),
        times = 100,
        unit = "relative"
    ) %>% print(digits = 3)
}
#> 
#> Benchmarking cluster-robust covariance matrix of type HC0
#> Unit: relative
#>  expr  min   lq mean median   uq  max neval cld
#>  sand 19.2 17.9   14   18.7 14.3 4.79   100   b
#>  jars  1.0  1.0    1    1.0  1.0 1.00   100  a 
#> 
#> Benchmarking cluster-robust covariance matrix of type HC1
#> Unit: relative
#>  expr min   lq mean median   uq  max neval cld
#>  sand  16 17.9   18   17.6 16.1 22.8   100   b
#>  jars   1  1.0    1    1.0  1.0  1.0   100  a


baruuum/jars documentation built on Nov. 3, 2019, 2:06 p.m.