knitr::opts_chunk$set( collapse = F, comment = "", fig.path = "README-" )
This R package implements the dynamic panel data modeling framework described by Allison, Williams, and Moral-Benito (2017). This approach allows fitting models with fixed effects that do not assume strict exogeneity of predictors. That means you can simultaneously get the robustness to confounding offered by fixed effects models and account for reciprocal causation between the predictors and the outcome variable. The estimating approach from Allison et al. provides better finite sample performance in terms of both bias and efficiency than other popular methods (e.g., the Arellano-Bond estimator).
These models are fit using structural equation models, using maximum
likelihood estimation and offering the missing data handling and flexibility
afforded by SEM. This package will reshape your data, specify the model
properly, and fit it with lavaan
.
If a result doesn't seem right, it would be a good idea to cross-reference it
with xtdpdml
for Stata. Go to
https://www3.nd.edu/~rwilliam/dynamic/
to learn about xtdpdml
and the underlying method. You may also be interested
in the article by Paul Allison, Richard Williams, and Enrique Moral-Benito in
Socius, accessible
here.
dpm
is now on CRAN and can be installed like other R packages.
install.packages("dpm")
This package assumes your data are in long format, with each row representing
a single observation of a single participant. Contrast this with wide format
in which each row contains all observations of a single participant.
For help on converting data from wide to long format, check out
the tutorial
that accompanies the panelr
package.
First we load the package and the WageData
from panelr
.
library(dpm) data("WageData", package = "panelr")
This next line of code converts the data to class panel_data
, which is a
class specific to the panelr
that
helps to simplify the treatment of the long-form
panel data. You don't have to do this, but it saves you from providing
id
and wave
arguments to the model fitting function each time you use it.
wages <- panel_data(WageData, id = id, wave = t)
The formula syntax used in this package is meant to be as similar to a typical regression model as possible.
The most basic model can be specified like any other: y ~ x
, where y
is
the dependent variable and x
is a time-varying predictor. If you would like
to include time-invariant predictors, you will make the formula consist of two
parts, separated with a bar (|
) like so: y ~ x | z
where z is a time
invariant predictor, like ethnicity.
One of the innovations of the method, however, is the notion of pre-determined,
or sequentially exogenous, predictors. To specify a model with a pre-determined
variable, put the variable within a pre
function, y ~ pre(x1) + x2 | z
.
This tells the function that x1
is pre-determined while x2
is strictly
exogenous by assumption. You could have multiple pre-determined predictors as
well (e.g., y ~ pre(x1) + pre(x2) | z
).
You may also fit models with lagged predictors. Simply apply the lag function
to the lagged predictors in the formula: y ~ pre(lag(x1)) + lag(x2) | z
.
To specify more than 1 lag, just provide it as an argument. For instance,
y ~ pre(lag(x1, 2)) + lag(x2) | z
will use 2 lags of the x1
variable.
This will replicate the analysis of the wages data in the Socius article that describes these models.
Note that to get matching standard errors, set information = "observed"
to
override lavaan
's default, information = "expected"
.
fit <- dpm(wks ~ pre(lag(union)) + lag(lwage) | ed, data = wages, error.inv = TRUE, information = "observed") summary(fit)
Any arguments supplied other than those that are documented within the
dpm
function are passed on to sem
from the lavaan
package.
The following arguments allow you to make changes to the default model specification:
y.lag
: By default the lag 1 value of the DV is included as a predictor
(this is why they are dynamic models). You may choose a different value or
multiple values instead, including 0 (no lagged DV at all).fixed.effects
: By default, the model is specified as a fixed effects model.
If you set this to FALSE, you get a random effects specification instead.error.inv
: This constrains error variances to be equal in each wave. It
is FALSE by default.const.inv
: This constrains the constants to be equal in each wave.
It is FALSE by default, but if TRUE it eliminates cross-sectional dependence.y.free
: This allows the regression coefficient of the lagged DV to vary
across time. It is FALSE by default and you can either set it to TRUE or
to the specific lag number(s).x.free
: This allows the regression coefficients for the predictors to
vary across time. It is FALSE by default and you can either set it to TRUE
to set all predictors' coefficients free over time or else pass a vector
of strings of the predictors whose coefficients should be set free over time.alpha.free
: If TRUE, relaxes the constraint that the fixed effects are
equal across time. Default is FALSE to be consistent with how fixed effects
models normally work.partial.pre
: If TRUE (FALSE by default), predetermined lagged predictors
will also be allowed to correlate with the contemporaneous error term as
suggested by Paul Allison
for scenarios when it's not clear whether you have chosen the right lag structure.You have most of the options available to you via lavaan
's summary method.
You can choose to omit any of: the z statistics (zstat = FALSE
),
the standard errors (se = FALSE
), or the p values (pvalue = FALSE
). You
may also add confidence intervals (ci = TRUE
) at any specified level
(ci.level = .95
). If you used bootstrapping for uncertainty intervals,
you can also specify the method (boot.ci.type = "perc"
).
The number of digits to print can be set via digits
or with the option
dpm-digits
. You may also standardize coefficients via lavaan
's method
using standardize = TRUE
.
If you just want the lavaan
model specification and don't want this package
to fit the model for you, you can set print.only = TRUE
. To reduce the
amount of output, I'm condensing wages
to 4 waves here.
dpm(wks ~ pre(lag(union)) + lag(lwage) | ed, data = wages[wages$t < 5,], print.only = TRUE)
Alternately, you can extract the lavaan
model syntax and wide-formatted
data from the fitted model object to do your own fitting and tweaking.
get_wide_data(fit) get_syntax(fit)
The model is a special type of lavaan
object. This means most methods
implemented for lavaan
objects will work on these. You can also convert
the fitted model into a typical lavaan
object:
as(fit, "lavaan")
lavaan
summaryWhile you could convert the model to lavaan
model and apply any of
lavaan
's functions to it (and you should!), as a convenience you can use
lav_summary()
to get lavaan
's summary of the model.
Take advantage of lavaan
's missing data handling by using the
missing = "fiml"
argument as well as any other arguments accepted by
lavaan::sem()
.
y ~ x + lag(x)
).~~ (Fixed in 1.0.0
)y ~ scale(x)
will cause an error.~~ (Works as of 1.1.0
)Feature parity with xtdpdml
(Stata) is a goal. Here's how we are doing
in terms of matching relevant xtdpdml
options:
alphafree
(as alpha.free
)xfree
(as x.free
)xfree(varlist)
(as x.free
)yfree
(added as y.free
argument in 1.0.0
)yfree(numlist)
re
(added via fixed.effects
argument in 1.0.0
)errorinv
(as error.inv
)nocsd
/constinv
(as const.inv
)ylag(numlist)
(added as y.lag
argument in 1.0.0
; option to
specify as 0 — no lagged DV — added in 1.1.0
)std
(but standardize
argument of summary
may suffice)dryrun
(as print.only
)Many and perhaps more SEM fitting options are implemented by virtue of
accepting any lavaan::sem()
argument.
lavaan
problem.y ~ scale(x)
(fixed in 1.1.0
)broom
methods (tidy
, glance
) (added tidy
in 1.1.0
)predict
method and perhaps some ability to plot predictionsx.free
option to allow the coefficients of all predictors to
vary across periods. This will make the summary
output a pain, so it will
take some time to implement. (added in 1.1.1
)Allison, P. (2022, October 24). Getting the lags right – a new solution. Statistical Horizons. https://statisticalhorizons.com/getting-the-lags-right-a-new-solution/
Allison, P. D., Williams, R., & Moral-Benito, E. (2017). Maximum likelihood for cross-lagged panel models with fixed effects. Socius, 3, 1–17. https://doi.org/10.1177/2378023117710578
Leszczensky, L., & Wolbring, T. (2022). How to deal with reverse causality using panel data? Recommendations for researchers based on a simulation study. Sociological Methods & Research, 51(2), 837–865. https://doi.org/10.1177/0049124119882473
Moral-Benito, E., Allison, P., & Williams, R. (2019). Dynamic panel data modelling using maximum likelihood: An alternative to Arellano-Bond. Applied Economics, 51, 2221–2232. https://doi.org/10.1080/00036846.2018.1540854
Williams, R., Allison, P. D., & Moral-Benito, E. (2018). Linear dynamic panel-data estimation using maximum likelihood and structural equation modeling. The Stata Journal, 18, 293–326. https://doi.org/10.1177/1536867X1801800201
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.