View source: R/f_multilevelIV.R
multilevelIV | R Documentation |
Estimates multilevel models (max. 3 levels) employing the GMM approach presented in Kim and Frees (2007). One of the important features is that, using the hierarchical structure of the data, no external instrumental variables are needed, unlike traditional instrumental variable techniques. Specifically, the approach controls for endogeneity at higher levels in the data hierarchy. For example, for a three-level model, endogeneity can be handled either if present at level two, at level three or at both levels. Level one endogeneity, where the regressors are correlated with the structural errors (errors at level one), is not addressed. Moreover, if considered, random slopes cannot be endogenous. Also, the dependent variable has to have a continuous distribution. The function returns the coefficient estimates obtained with fixed effects, random effects and the GMM estimator proposed by Kim and Frees (2007), such that a comparison across models can be done. Asymptotically, the multilevel GMM estimators share the same properties of corresponding fixed effects estimators, but they allow the estimation of all the variables in the model, unlike the fixed effects counterpart.
To facilitate the choice of the estimator to be used for the given data, the function also conducts
omitted variable test based on the Hausman-test for panel data (Hausman, 1978). It allows to compare
a robust estimator and an estimator that is efficient under the null hypothesis of no omitted variables,
and to compare two robust estimators at different levels. The results of these tests are returned when
calling summary()
on a fitted model.
multilevelIV(
formula,
data,
lmer.control = lmerControl(optimizer = "Nelder_Mead", optCtrl = list(maxfun = 1e+05)),
verbose = TRUE
)
formula |
A symbolic description of the model to be fitted. See the "Details" section for the exact notation. |
data |
A data.frame containing the data of all parts specified in the formula parameter. |
lmer.control |
An output from |
verbose |
Show details about the running of the function. |
Multilevel modeling is a generalization of regression methods that recognize the existence of such data hierarchies by allowing for residual components at each level in the hierarchy. For example, a three-level multilevel model which allows for grouping of students within classrooms, over time, would include time, student and classroom residuals (see equation below). Thus, the residual variance is partitioned into four components: between-classroom (the variance of the classroom-level residuals), within-classroom (the variance of the student-level residuals), between student (the variance of the student-level residuals) and within-student (the variance of the time-level residuals). The classroom residuals represent the unobserved classroom characteristics that affect student's outcomes. These unobserved variables lead to correlation between outcomes for students from the same classroom. Similarly, the unobserved time residuals lead to correlation between a student's outcomes over time. A three-level model can be described as follows:
Like in single-level regression, in multilevel models endogeneity is also a concern. The additional problem is that in multilevel models there are multiple independent assumptions involving various random components at different levels. Any moderate correlation between some predictors and a random component or error term, can result in a significant bias of the coefficients and of the variance components. The multilevel GMM approach for addressing endogeneity uses both the between and within variations of the exogenous variables, but only the within variation of the variables assumed endogenous. The assumptions in the multilevel generalized moment of moments model is that the errors at each level are normally distributed and independent of each other. Moreover, the slope variables are assumed exogenous. Since the model does not handle "level 1 dependencies", an additional assumption is that the level 1 structural error is uncorrelated with any of the regressors. If this assumption is not met, additional, external instruments are necessary. The coefficients of the explanatory variables appear in the vectors β1, β2 and β3. The term β1cs captures latent, unobserved characteristics that are classroom and student specific while β2c captures latent, unobserved characteristics that are classroom specific. For identification, the disturbance term εcst is assumed independent of the other variables, Z1cst and X1cst. When all model variables are assumed exogenous, the GMM estimator is the usual GLS estimator, denoted as REF. When all variables (except the variables used as slope) are assumed endogenous, the fixed-effects estimator is used, FE. While REF assumes all explanatory variables are uncorrelated with the random intercepts and slopes in the model, FE allows for endogeneity of all effects but sweeps out the random components as well as the explanatory variables at the same levels. The more general estimator GMM proposed by Kim and Frees (2007) allows for some of the explanatory variables to be endogenous and uses this information to build instrumental variables. The multilevel GMM estimator uses both the between and within variations of the exogenous variables, but only the within variation of the variables assumed endogenous. When all variables are assumed exogenous, GMM estimator equals REF. When all covariates are assume endogenous, GMM equals FE.
The formula
argument follows a two part notation:
In the first part, the model is specified while in the second part, the endogenous regressors are indicated.
These two parts are separated by a single vertical bar (|
).
The first RHS follows the exact same model specification as required by the lmer
function of package lme4
and internally will be used to fit a lmer
model. In the second part,
one or multiple endogenous regressors are indicated by passing them to the special function endo
(e.g. endo(X1, X2)
). Note that no argument to endo()
is to be supplied as character
but as symbols without quotation marks.
See the example section for illustrations on how to specify the formula
parameter.
multilevelIV
returns an object of class "rendo.multilevel
".
The generic accessor functions coef
, fitted
, residuals
, vcov
, confint
, and nobs
, are available.
Note that an additional argument model
with possible values "REF", "FE_L2", "FE_L3", "GMM_L2"
, or "GMM_L3"
is
available for summary
, fitted
, residuals
, confint
, and vcov
to extract the features for the specified model.
Note that the obtained coefficients are rounded with round(x, digits=getOption("digits"))
.
An object of class rendo.multilevel
is returned that is a list and contains the following components:
formula |
the formula given to specify the model to be fitted. |
num.levels |
the number of levels detected from the model. |
dt.model.data |
a data.table of model data including data for slopes and level group ids |
coefficients |
a matrix of rounded coefficients, one column per model. |
coefficients.se |
a matrix of coefficients' SE, one column per model. |
l.fitted |
a named list which contains the fitted values per model sorted as the input data |
l.residuals |
a named list which contains the residuals per model sorted as the input data |
l.vcov |
a list of variance-covariance matrix, named per model. |
V |
the variance–covariance matrix V of the disturbance term. |
W |
the weight matrix W, such that W=V^(-1/2) per highest level group. |
l.ovt |
a list of results of the Hausman OVT, named per model. |
Hausman J (1978). “Specification Tests in Econometrics.” Econometrica, 46(6), 1251–1271.
Kim, Jee-Seon and Frees, Edward W. (2007). "Multilevel Modeling with Correlated Effects". Psychometrika, 72(4), 505-533.
lmer
for more details on how to specify the formula
parameter
lmerControl
for more details on how to provide the lmer.control
parameter
summary
for how fitted models are summarized
data("dataMultilevelIV")
# Two levels
res.ml.L2 <- multilevelIV(y ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 + X31 +
X32 + X33 + (1|SID) | endo(X15),
data = dataMultilevelIV, verbose = FALSE)
# Three levels
res.ml.L3 <- multilevelIV(y ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 + X31 +
X32 + X33 + (1| CID) + (1|SID) | endo(X15),
data = dataMultilevelIV, verbose = FALSE)
# L2 with multiple endogenous regressors
res.ml.L2 <- multilevelIV(y ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 + X31 +
X32 + X33 + (1|SID) | endo(X15, X21, X22),
data = dataMultilevelIV, verbose = FALSE)
# same as above
res.ml.L2 <- multilevelIV(y ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 + X31 +
X32 + X33 + (1|SID) | endo(X15, X21) + endo(X22),
data = dataMultilevelIV, verbose = FALSE)
# Fit above model with different settings for lmer()
lmer.control <- lme4::lmerControl(optimizer="nloptwrap",
optCtrl=list(algorithm="NLOPT_LN_COBYLA",
xtol_rel=1e-6))
res.ml.L2.cob <- multilevelIV(y ~ X11 + X12 + X13 + X14 + X15 + X21 + X22 + X23 + X24 +
X31 + X32 + X33 + (1|SID) | endo(X15, X21) + endo(X22),
data = dataMultilevelIV, verbose = FALSE,
lmer.control = lmer.control) # use different controls for lmer
# specify argument "model" in the S3 methods to obtain results for the respective model
# default is "REF" for all methods
summary(res.ml.L3)
# same as above
summary(res.ml.L3, model = "REF")
# complete pval table for L3 fixed effects
L3.FE.p <- coef(summary(res.ml.L3, model = "FE_L3"))
# variance covariance matrix
L2.FE.var <- vcov(res.ml.L2, model = "FE_L2")
L2.GMM.var <- vcov(res.ml.L2, model = "GMM_L2")
# residuals
L3.REF.resid <- resid(res.ml.L3, model = "REF")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.