coeff.robust | R Documentation |
This function computes (1) heteroscedasticity-consistent or cluster-robust
standard errors standard errors and significance values for (generalized) linear
models estimated by using the lm()
or the glm()
function and
(2) cluster-robust standard errors for multilevel and linear mixed-effects models
estimated by using the lmer()
function from the lme4 package that
are robust to the violation of the homoscedasticity assumption. For linear models
the heteroscedasticity-robust F-test is computed as well. By default, the function
uses the HC4 estimator for (generalized) linear models and the heteroscedastic-robust
CR2 estimator for multilevel and linear mixed-effects models. Note that cluster-robust
standard errors are available only for two-level models.
coeff.robust(model, cluster = NULL,
type = c("HC0", "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5",
"CR0", "CR1", "CR1p", "CR1S", "CR2", "CR3"),
digits = 2, p.digits = 3, write = NULL, append = TRUE,
check = TRUE, output = TRUE)
model |
a fitted model of class |
cluster |
a vector representing the nested grouping structure (i.e., group
or cluster variable). This argument is used only when requesting
cluster-robust standard errors for (generalized) linear models
estimated by using the |
type |
a character string specifying the estimation type for (generalized)
linear models estimated by using the |
digits |
an integer value indicating the number of decimal places
to be used for displaying results. Note that information
criteria and chi-square test statistic are printed with
|
p.digits |
an integer value indicating the number of decimal places |
write |
a character string naming a file for writing the output into
either a text file with file extension |
append |
logical: if |
check |
logical: if |
output |
logical: if |
The family of
heteroscedasticity-consistent (HC) standard errors estimator for the model
parameters of a regression model is based on an HC covariance matrix
of the parameter estimates and does not require the assumption of homoscedasticity.
HC estimators approach the correct value with increasing sample size, even in
the presence of heteroscedasticity. On the other hand, the OLS standard error
estimator is biased and does not converge to the proper value when the assumption
of homoscedasticity is violated (Darlington & Hayes, 2017). White (1980) introduced
the idea of HC covariance matrix to econometricians and derived the asymptotically
justified form of the HC covariance matrix known as HC0 (Long & Ervin, 2000).
Simulation studies have shown that the HC0 estimator tends to underestimate the
true variance in small to moderately large samples (N \leq 250
) and in
the presence of leverage observations, which leads to an inflated type I error
risk (e.g., Cribari-Neto & Lima, 2014). The alternative estimators HC1 to HC5
are asymptotically equivalent to HC0 but include finite-sample corrections,
which results in superior small sample properties compared to the HC0 estimator.
Long and Ervin (2000) recommended routinely using the HC3 estimator regardless
of a heteroscedasticity test. However, the HC3 estimator can be unreliable when
the data contains leverage observations. The HC4 estimator, on the other hand,
performs well with small samples, in the presence of high leverage observations,
and when errors are not normally distributed (Cribari-Neto, 2004). In summary,
it appears that the HC4 estimator performs the best in terms of controlling the
type I and type II error risk (Rosopa, 2013). As opposed to the findings of
Cribari-Neto et al. (2007), the HC5 estimator did not show any substantial
advantages over HC4. Both HC5 and HC4 performed similarly across all the simulation
conditions considered in the study (Ng & Wilcox, 2009).
Note that the F-test of significance on the multiple correlation coefficient
R
also assumes homoscedasticity of the errors. Violations of this assumption
can result in a hypothesis test that is either liberal or conservative, depending
on the form and severity of the heteroscedasticity.
Hayes (2007) argued that using a HC estimator instead of assuming homoscedasticity
provides researchers with more confidence in the validity and statistical power
of inferential tests in regression analysis. Hence, the HC3 or HC4 estimator
should be used routinely when estimating regression models. If a HC estimator
is not used as the default method of standard error estimation, researchers are
advised to at least double-check the results by using an HC estimator to ensure
that conclusions are not compromised by heteroscedasticity. However, the presence
of heteroscedasticity suggests that the data is not adequately explained by
the statistical model of estimated conditional means. Unless heteroscedasticity
is believed to be solely caused by measurement error associated with the predictor
variable(s), it should serve as warning to the researcher regarding the adequacy
of the estimated model.
The family of cluster-robust (CR) standard errors estimator for the model parameters of a multilevel and linear mixed-effects model are based on the heteroscedasticity-consistent (HC) standard errors estimators that have been generalized to clustered data (Zhang & Lai, 2024). The standard errors of the CR0 estimator (Liang and Zeger, 1986) rely on large samples, i.e., the CR0 estimator may result in underestimated standard errors with small number of clusters (Cameron & Miller, 2015; Imbens & Kolesar, 2016). However, there is no consensus about the minimum number of clusters, e.g., at least 100 clusters (Maas & Hox, 2004, p. 439), around 40 (Angrist & Pischke, 2008) or 30 clusters (Huang, 2016). The CR2 estimator, also referred to as Bell and McCaffrey (2002) bias-reduced linearization method, has been shown to be effective when used with a small number of clusters (Hugang & Li, 2022). For example, the CR2 estimator performed well in all conditions of a simulation study involving 20, 50, or 100 clusters regardless if homoskedasticity was violated or not. (Huang, et al, 2023). The CR3 estimator tends to over-correct the bias of the CR0 estiamator, while the CR1 estimator tends to under-correct the bias (Pustejovsky & Tipton, 2018). Note that the cluster-robust SE are only robust to violation of the homoscedasticity assumption, while departure from normality or the presence of outliers can influence its performance (MacKinnon, 2012). Statistical significance testing of the regression coefficients is based on the Satterthwaite approximated degrees of freedom (Bell & McCaffrey (2002).
Returns an object of class misty.object
, which is a list with following
entries:
call |
function call |
type |
type of analysis |
model |
model specified in |
args |
specification of function arguments |
result |
list with results, i.e., |
The computation of heteroscedasticity-consistent standard errors is based on
the vcovHC
function from the sandwich package (Zeileis, Köll, &
Graham, 2020) and the functions coeftest
and waldtest
from the
lmtest
package (Zeileis & Hothorn, 2002), while the computation of
cluster-robust standard errors uses the vcovCR
and the coef_test
function in the clubSandwich package.
Takuya Yanagida takuya.yanagida@univie.ac.at
Angrist, J. D., & Pischke, J.-S. (2008). Mostly harmless econometrics: An empiricist’s companion. Princeton university press. Bell, R. M., & McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2), 169-181
Cameron, A. C., & Miller, D. L. (2015). A practitioner’s guide to cluster-robust inference. Journal of Human Resources, 50(2), 317-372. https://doi.org/10.3368/jhr.50.2.317
Darlington, R. B., & Hayes, A. F. (2017). Regression analysis and linear models: Concepts, applications, and implementation. The Guilford Press.
Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of unknown form. Computational Statistics & Data Analysis, 45, 215-233. https://doi.org/10.1016/S0167-9473(02)00366-3
Cribari-Neto, F., & Lima, M. G. (2014). New heteroskedasticity-robust standard errors for the linear regression model. Brazilian Journal of Probability and Statistics, 28, 83-95.
Cribari-Neto, F., Souza, T., & Vasconcellos, K. L. P. (2007). Inference under heteroskedasticity and leveraged data. Communications in Statistics - Theory and Methods, 36, 1877-1888. https://doi.org/10.1080/03610920601126589
Hayes, A.F, & Cai, L. (2007). Using heteroscedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation. Behavior Research Methods, 39, 709-722. https://doi.org/10.3758/BF03192961
Huang, F. L., & Li, X. (2022). Using cluster-robust standard errors when analyzing group-randomized trials with few clusters. Behavior Research Methods, 54(3), 1181–1199. https://doi.org/10.3758/s13428-021-01627-0
Kuznetsova, A, Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest Package: Tests in linear mixed effects models. Journal of Statistical Software, 82 13, 1-26. https://doi.org/10.18637/jss.v082.i13.
Imbens, G. W., & Kolesar, M. (2016). Robust standard errors in small samples: Some practical advice. Review of Economics and Statistics, 98(4), 701-712. https://doi.org/10.1162/REST_a_00552
Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13-22. https://doi.org/10.1093/biomet/73.1.13
Long, J.S., & Ervin, L.H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. The American Statistician, 54, 217-224. https://doi.org/10.1080/00031305.2000.10474549
Maas, C., & Hox, J. J. (2004). The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis, 46(3), 427-440. https://doi.org/10.1016/j.csda.2003.08.006
MacKinnon, J. G. (2012). Thirty years of heteroskedasticity-robust inference. In X. Chen & N. R. Swanson (Eds.), Recent advances and future directions in causality, prediction, and specification analysis: Essays in honor of Halbert L. White Jr (pp. 437-461). Springer. https://doi.org/10.1007/978-1-4614-1653-1_17
Ng, M., & Wilcoy, R. R. (2009). Level robust methods based on the least squares regression estimator. Journal of Modern Applied Statistical Methods, 8, 284-395. https://doi.org/10.22237/jmasm/1257033840
Pustejovsky, J. E. & Tipton, E. (2018). Small sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business and Economic Statistics, 36(4), 672-683. https://doi.org/10.1080/07350015.2016.1247004
Rosopa, P. J., Schaffer, M. M., & Schroeder, A. N. (2013). Managing heteroscedasticity in general linear models. Psychological Methods, 18(3), 335-351. https://doi.org/10.1037/a0032553
White, H. (1980). A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica, 48, 817-838. https://doi.org/10.2307/1912934
Zeileis, A., & Hothorn, T. (2002). Diagnostic checking in regression relationships. R News, 2(3), 7-10. http://CRAN.R-project.org/doc/Rnews/
Zeileis A, Köll S, & Graham N (2020). Various versatile variances: An object-oriented implementation of clustered covariances in R. Journal of Statistical Software, 95(1), 1-36. https://doi.org/10.18637/jss.v095.i01
Zhang, Y., & Lai, M. H. C. (2024). Evaluating two small-sample corrections for fixed-effects standard errors and inferences in multilevel models with heteroscedastic, unbalanced, clustered data. Behavior research methods, 56(6), 5930–5946. https://doi.org/10.3758/s13428-023-02325-9
coeff.std
, write.result
#----------------------------------------------------------------------------
# Example 1: Linear model
mod.lm <- lm(mpg ~ cyl + disp, data = mtcars)
coeff.robust(mod.lm)
#----------------------------------------------------------------------------
# Example 2: Generalized linear model
mod.glm <- glm(carb ~ cyl + disp, data = mtcars, family = poisson())
coeff.robust(mod.glm)
## Not run:
#----------------------------------------------------------------------------
# Example 3: Multilevel and Linear Mixed-Effects Model
# Load lme4 and misty package
misty::libraries(lme4, misty)
# Load data set "Demo.twolevel" in the lavaan package
data("Demo.twolevel", package = "lavaan")
# Cluster-mean centering, center() from the misty package
Demo.twolevel <- center(Demo.twolevel, x2, type = "CWC", cluster = "cluster")
# Grand-mean centering, center() from the misty package
Demo.twolevel <- center(Demo.twolevel, w1, type = "CGM", cluster = "cluster")
# Estimate two-level mixed-effects model
mod.lmer <- lmer(y1 ~ x2.c + w1.c + x2.c:w1.c + (1 + x2.c | cluster), data = Demo.twolevel)
# Statistical significance testing based on cluster-robust standard errors
coeff.robust(mod.lmer)
#----------------------------------------------------------------------------
# Write Results
# Example 3a: Write results into a text file
coeff.robust(mod.lm, write = "Robust_Coef.txt", output = FALSE)
# Example 3b: Write results into a Excel file
coeff.robust(mod.lm, write = "Robust_Coef.xlsx", output = FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.