As the title suggests, this is Just Another Rrobust Standard Errors R package for linear models. The main goal of writing this package was for convenience. As robust standard errors calculations have become the norm in the social sciences, it seemed to be good to have a simply way to implement them.
The package will overwrite the summary.lm
S3 method to add two
optional arguments, robust
and cluster
. Hence, when the package is
loaded, a warning message should appear that the method summary.lm
method was “masked.” All robust covariance matrix calculations are done
in C++
using the Armadillo linear algebra library and sourced via the
RcppArmadillo
package. The code should be therefore faster than
alternative packages.
Right now, the package implements only a small number of robust
covariance matrix estimators. Also, the functions provided will work
only on linear regression models (i.e., R objects of class lm
). The
package might be extended in the future to other models. But for now,
this is all I’ve got.
The general usage of the package is quite simple. First, the package needs to be installed and attached to the search path:
devtools::install_github("baruuum/jars")
library("jars")
#> Registered S3 method overwritten by 'jars':
#> method from
#> summary.lm stats
#>
#> jars version 0.0.1
Thereafter, heteroskedasticity and cluster-robust standard errors can be
calculated by specifying the options robust
and cluster
,
respectively.
To show how the package might be put to work, let us first simulate some data.
set.seed(15232)
# obs
n = 5000
# predictors
x1 = runif(n)
x2 = rnorm(n)
# clusters
z = sample(1:6, n, replace = T)
# outcome
y = 2 + runif(1, -1, 1) * x1 + runif(1, -1, 1) * x2 + rnorm(n)
# fit linear model
m = lm(y ~ x1 + x2)
After this, we can calculate heteroskedasticity robust standard errors
by using the summary
function:
summary(m, robust = TRUE)
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Robust covariance matrix of type HC1 is used
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.028
#> x1 0.753 0.050
#> x2 -0.407 0.015
#>
#> N = 5000, K = 3, RMSE = 1.01, R-squared = 0.173
To see the stars, we might change the default star = FALSE
option to
TRUE
in the print
function:
mysum = summary(m, robust = TRUE)
print(mysum, stars = T)
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Robust covariance matrix of type HC1 is used
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.9666 0.0280 70.2 <2e-16 ***
#> x1 0.7530 0.0495 15.2 <2e-16 ***
#> x2 -0.4067 0.0146 -27.8 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> N = 5000, K = 3, RMSE = 1.01, R-squared = 0.173
By default, the robust = TRUE
option will calculate the HC1 type of
robust standard errors. Right now, the package supports HC0 to HC4. The
different estimators differ only in their degrees of freedom correction
to correct for finite-sample bias. To estimate a different type of
robust standard errors, the type
option might be used. For example, to
obtain HC3-type robust standard errors, we just add the type = "HC3"
option:
summary(m, robust = T, type = "HC3")
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Robust covariance matrix of type HC3 is used
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.028
#> x1 0.753 0.050
#> x2 -0.407 0.015
#>
#> N = 5000, K = 3, RMSE = 1.01, R-squared = 0.173
The code to compute cluster-robust standard errors looks quite similar.
The only difference is that we have to pass into the summary
function
an integer vector that contains information regarding the clustered
units. We have simulated such a vector above (z
), which we might use
here:
summary(m, cluster = z, type = "HC1")
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Cluster-robust covariance matrix of type HC1 is used
#> (Number of clusters: 6)
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.053
#> x1 0.753 0.075
#> x2 -0.407 0.021
#>
#> N = 5000, K = 3, RMSE = 31.8, R-squared = 0.173
Again, by default, the HC1 type degree of freedom correction is used.
The HC1 type is also the default in STATA, when the robust
or
vce(cluster)
option is used. For the clustered standard errors, only
HC0 and HC1 types are currently available.
A note of caution: It is desirable that the clustering variable has no
missing values. In the current implementation, the cluster-robust
standard errors will be calculated after the observations for which the
clustering variable is missing are dropped. This leads to the problem
that the sample from which the coefficients of the linear model are
estimated and that from which the standard errors are estimated will
differ. This behavior contrasts with that of the sandwich
package,
where the observations for which the clustering variable are NA
are
treated as their own cluster. Both the sandwich
package as well as
this package will throw a warning if any missing values on the
clustering variable are found in the data.
sandwich
packageTo use the sandwich
package, we need to load the lmtest
package as
well. The functions included in the sandwich
package will calculate
robust covariance matrices, which are then used to adjust the estimates
in regression models via the lmtest::coeftest
function.
We might first compare whether the sandwich
functions and the jars
functions return the same estimates of different kinds of robust
covariance matrices. It should be noted that the sandwich
package
implements a much larger variety of robust covariance matrix estimators.
library("lmtest")
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library("sandwich")
library("magrittr")
# types of heteroskedasticity-robust estimators
het_type = paste0("HC", 0:4)
# check whether results are equal to sandwich package
for (h in het_type) {
all.equal(
vcovHC(m, type = h),
vcov(summary(m, robust = T, type = h)),
check.attributes = FALSE
) %>% print
}
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
# types of cluster-robust estimators
clust_type = paste0("HC", c(0, 1))
for (h in clust_type) {
all.equal(
vcovCL(m, cluster = z, type = h),
vcov(summary(m, cluster = z, type = h)),
check.attributes = FALSE
) %>% print
}
#> [1] TRUE
#> [1] TRUE
We can also look into the printed results from these packages:
# using lmtest::coeftest
coeftest(m, vcov. = vcovCL(m, cluster = z, type = "HC1"))
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.966629 0.053023 37.090 < 2.2e-16 ***
#> x1 0.753044 0.075303 10.000 < 2.2e-16 ***
#> x2 -0.406680 0.021142 -19.235 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# using jars
summary(m, cluster = z, type = "HC1")
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Cluster-robust covariance matrix of type HC1 is used
#> (Number of clusters: 6)
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.053
#> x1 0.753 0.075
#> x2 -0.407 0.021
#>
#> N = 5000, K = 3, RMSE = 31.8, R-squared = 0.173
We might also compare the two packages in terms of computational speed.
The main functions of the sandwich
and jars
package to calculate
robust standard errors are, respectively, vcovHC
and .robustSE
.
# Heteroskedasticity-robust covariance matrix estimators
for (h in het_type) {
cat(paste0("\nBenchmarking het-robust covariance matrix of type ", h,"\n"))
microbenchmark::microbenchmark(
sand = vcovHC(m, type = h),
jars = .robustSE(model.matrix(m), resid(m), type = h),
times = 100,
unit = "relative"
) %>% print(digits = 3)
}
#>
#> Benchmarking het-robust covariance matrix of type HC0
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 30.3 28 29.4 29.6 33.9 11.9 100 b
#> jars 1.0 1 1.0 1.0 1.0 1.0 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC1
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 31.5 29.2 21.4 28.8 26 5.81 100 b
#> jars 1.0 1.0 1.0 1.0 1 1.00 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC2
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 8.33 8.41 8.69 9.09 8.84 7.7 100 b
#> jars 1.00 1.00 1.00 1.00 1.00 1.0 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC3
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 15.9 16.7 17.5 18.2 20.6 11.4 100 b
#> jars 1.0 1.0 1.0 1.0 1.0 1.0 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC4
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 8.42 9.33 10.2 10.4 10.7 12 100 b
#> jars 1.00 1.00 1.0 1.0 1.0 1 100 a
# Cluster-robust covariance matrix estimators
for (h in clust_type) {
cat(paste0("\nBenchmarking cluster-robust covariance matrix of type ", h,"\n"))
microbenchmark::microbenchmark(
sand = vcovHC(m, cluster = z, type = h),
jars = .clusterSE(model.matrix(m), resid(m), z, type = h),
times = 100,
unit = "relative"
) %>% print(digits = 3)
}
#>
#> Benchmarking cluster-robust covariance matrix of type HC0
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 19.2 17.9 14 18.7 14.3 4.79 100 b
#> jars 1.0 1.0 1 1.0 1.0 1.00 100 a
#>
#> Benchmarking cluster-robust covariance matrix of type HC1
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 16 17.9 18 17.6 16.1 22.8 100 b
#> jars 1 1.0 1 1.0 1.0 1.0 100 a
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.