As the title suggests, this is **J**ust **A**nother **R**robust
**S**tandard Errors R package for linear models. The main goal of
writing this package was for convenience. As robust standard errors
calculations have become the norm in the social sciences, it seemed to
be good to have a simply way to implement them.

The package will overwrite the `summary.lm`

S3 method to add two
optional arguments, `robust`

and `cluster`

. Hence, when the package is
loaded, a warning message should appear that the method `summary.lm`

method was “masked.” All robust covariance matrix calculations are done
in `C++`

using the Armadillo linear algebra library and sourced via the
`RcppArmadillo`

package. The code should be therefore faster than
alternative packages.

Right now, the package implements only a small number of robust
covariance matrix estimators. Also, the functions provided will work
only on linear regression models (i.e., R objects of class `lm`

). The
package might be extended in the future to other models. But for now,
this is all I’ve got.

The general usage of the package is quite simple. First, the package needs to be installed and attached to the search path:

```
devtools::install_github("baruuum/jars")
library("jars")
```

```
#> Registered S3 method overwritten by 'jars':
#> method from
#> summary.lm stats
#>
#> jars version 0.0.1
```

Thereafter, heteroskedasticity and cluster-robust standard errors can be
calculated by specifying the options `robust`

and `cluster`

,
respectively.

To show how the package might be put to work, let us first simulate some data.

```
set.seed(15232)
# obs
n = 5000
# predictors
x1 = runif(n)
x2 = rnorm(n)
# clusters
z = sample(1:6, n, replace = T)
# outcome
y = 2 + runif(1, -1, 1) * x1 + runif(1, -1, 1) * x2 + rnorm(n)
# fit linear model
m = lm(y ~ x1 + x2)
```

After this, we can calculate heteroskedasticity robust standard errors
by using the `summary`

function:

```
summary(m, robust = TRUE)
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Robust covariance matrix of type HC1 is used
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.028
#> x1 0.753 0.050
#> x2 -0.407 0.015
#>
#> N = 5000, K = 3, RMSE = 1.01, R-squared = 0.173
```

To see the stars, we might change the default `star = FALSE`

option to
`TRUE`

in the `print`

function:

```
mysum = summary(m, robust = TRUE)
print(mysum, stars = T)
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Robust covariance matrix of type HC1 is used
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.9666 0.0280 70.2 <2e-16 ***
#> x1 0.7530 0.0495 15.2 <2e-16 ***
#> x2 -0.4067 0.0146 -27.8 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> N = 5000, K = 3, RMSE = 1.01, R-squared = 0.173
```

By default, the `robust = TRUE`

option will calculate the HC1 type of
robust standard errors. Right now, the package supports HC0 to HC4. The
different estimators differ only in their degrees of freedom correction
to correct for finite-sample bias. To estimate a different type of
robust standard errors, the `type`

option might be used. For example, to
obtain HC3-type robust standard errors, we just add the `type = "HC3"`

option:

```
summary(m, robust = T, type = "HC3")
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Robust covariance matrix of type HC3 is used
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.028
#> x1 0.753 0.050
#> x2 -0.407 0.015
#>
#> N = 5000, K = 3, RMSE = 1.01, R-squared = 0.173
```

The code to compute cluster-robust standard errors looks quite similar.
The only difference is that we have to pass into the `summary`

function
an **integer vector** that contains information regarding the clustered
units. We have simulated such a vector above (`z`

), which we might use
here:

```
summary(m, cluster = z, type = "HC1")
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Cluster-robust covariance matrix of type HC1 is used
#> (Number of clusters: 6)
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.053
#> x1 0.753 0.075
#> x2 -0.407 0.021
#>
#> N = 5000, K = 3, RMSE = 31.8, R-squared = 0.173
```

Again, by default, the HC1 type degree of freedom correction is used.
The HC1 type is also the default in STATA, when the `robust`

or
`vce(cluster)`

option is used. For the clustered standard errors, only
HC0 and HC1 types are currently available.

A note of caution: It is desirable that the clustering variable has no
missing values. In the current implementation, the cluster-robust
standard errors will be calculated after the observations for which the
clustering variable is missing are dropped. This leads to the problem
that the sample from which the coefficients of the linear model are
estimated and that from which the standard errors are estimated will
differ. This behavior contrasts with that of the `sandwich`

package,
where the observations for which the clustering variable are `NA`

are
treated as their own cluster. Both the `sandwich`

package as well as
this package will throw a warning if any missing values on the
clustering variable are found in the data.

`sandwich`

packageTo use the `sandwich`

package, we need to load the `lmtest`

package as
well. The functions included in the `sandwich`

package will calculate
robust covariance matrices, which are then used to adjust the estimates
in regression models via the `lmtest::coeftest`

function.

We might first compare whether the `sandwich`

functions and the `jars`

functions return the same estimates of different kinds of robust
covariance matrices. It should be noted that the `sandwich`

package
implements a much larger variety of robust covariance matrix estimators.

```
library("lmtest")
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library("sandwich")
library("magrittr")
# types of heteroskedasticity-robust estimators
het_type = paste0("HC", 0:4)
# check whether results are equal to sandwich package
for (h in het_type) {
all.equal(
vcovHC(m, type = h),
vcov(summary(m, robust = T, type = h)),
check.attributes = FALSE
) %>% print
}
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
#> [1] TRUE
# types of cluster-robust estimators
clust_type = paste0("HC", c(0, 1))
for (h in clust_type) {
all.equal(
vcovCL(m, cluster = z, type = h),
vcov(summary(m, cluster = z, type = h)),
check.attributes = FALSE
) %>% print
}
#> [1] TRUE
#> [1] TRUE
```

We can also look into the printed results from these packages:

```
# using lmtest::coeftest
coeftest(m, vcov. = vcovCL(m, cluster = z, type = "HC1"))
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.966629 0.053023 37.090 < 2.2e-16 ***
#> x1 0.753044 0.075303 10.000 < 2.2e-16 ***
#> x2 -0.406680 0.021142 -19.235 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# using jars
summary(m, cluster = z, type = "HC1")
#>
#> Call:
#> lm(formula = y ~ x1 + x2)
#>
#> Cluster-robust covariance matrix of type HC1 is used
#> (Number of clusters: 6)
#>
#> Coefficients:
#> Estimate Std. Error
#> (Intercept) 1.967 0.053
#> x1 0.753 0.075
#> x2 -0.407 0.021
#>
#> N = 5000, K = 3, RMSE = 31.8, R-squared = 0.173
```

We might also compare the two packages in terms of computational speed.
The main functions of the `sandwich`

and `jars`

package to calculate
robust standard errors are, respectively, `vcovHC`

and `.robustSE`

.

```
# Heteroskedasticity-robust covariance matrix estimators
for (h in het_type) {
cat(paste0("\nBenchmarking het-robust covariance matrix of type ", h,"\n"))
microbenchmark::microbenchmark(
sand = vcovHC(m, type = h),
jars = .robustSE(model.matrix(m), resid(m), type = h),
times = 100,
unit = "relative"
) %>% print(digits = 3)
}
#>
#> Benchmarking het-robust covariance matrix of type HC0
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 30.3 28 29.4 29.6 33.9 11.9 100 b
#> jars 1.0 1 1.0 1.0 1.0 1.0 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC1
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 31.5 29.2 21.4 28.8 26 5.81 100 b
#> jars 1.0 1.0 1.0 1.0 1 1.00 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC2
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 8.33 8.41 8.69 9.09 8.84 7.7 100 b
#> jars 1.00 1.00 1.00 1.00 1.00 1.0 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC3
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 15.9 16.7 17.5 18.2 20.6 11.4 100 b
#> jars 1.0 1.0 1.0 1.0 1.0 1.0 100 a
#>
#> Benchmarking het-robust covariance matrix of type HC4
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 8.42 9.33 10.2 10.4 10.7 12 100 b
#> jars 1.00 1.00 1.0 1.0 1.0 1 100 a
# Cluster-robust covariance matrix estimators
for (h in clust_type) {
cat(paste0("\nBenchmarking cluster-robust covariance matrix of type ", h,"\n"))
microbenchmark::microbenchmark(
sand = vcovHC(m, cluster = z, type = h),
jars = .clusterSE(model.matrix(m), resid(m), z, type = h),
times = 100,
unit = "relative"
) %>% print(digits = 3)
}
#>
#> Benchmarking cluster-robust covariance matrix of type HC0
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 19.2 17.9 14 18.7 14.3 4.79 100 b
#> jars 1.0 1.0 1 1.0 1.0 1.00 100 a
#>
#> Benchmarking cluster-robust covariance matrix of type HC1
#> Unit: relative
#> expr min lq mean median uq max neval cld
#> sand 16 17.9 18 17.6 16.1 22.8 100 b
#> jars 1 1.0 1 1.0 1.0 1.0 100 a
```

baruuum/jars documentation built on Nov. 3, 2019, 2:06 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.