dsem | R Documentation |
Fits a dynamic structural equation model
dsem(
sem,
tsdata,
family = rep("fixed", ncol(tsdata)),
estimate_delta0 = FALSE,
prior_negloglike = NULL,
control = dsem_control(),
covs = colnames(tsdata)
)
sem |
Specification for time-series structural equation model structure
including lagged or simultaneous effects. See Details section in
|
tsdata |
time-series data, as outputted using |
family |
Character-vector listing the distribution used for each column of |
estimate_delta0 |
Boolean indicating whether to estimate deviations from equilibrium in initial year as fixed effects, or alternatively to assume that dynamics start at some stochastic draw away from the stationary distribution |
prior_negloglike |
A user-provided function that takes as input the vector of fixed effects out$obj$par
returns the negative log-prior probability. For example
|
control |
Output from |
covs |
optional: a character vector of one or more elements, with each element giving a string of variable
names, separated by commas. Variances and covariances among all variables in each such string are
added to the model. Warning: covs="x1, x2" and covs=c("x1", "x2") are not equivalent:
covs="x1, x2" specifies the variance of x1, the variance of x2, and their covariance,
while covs=c("x1", "x2") specifies the variance of x1 and the variance of x2 but not their covariance.
These same covariances can be added manually via argument |
A DSEM involves (at a minimum):
a matrix \mathbf X
where column \mathbf x_c
for variable c is
a time-series;
a user-supplied specification for the path coefficients, which
define the precision (inverse covariance) \mathbf Q
for a matrix of state-variables
and see make_dsem_ram
for more details on the math involved.
The model also estimates the time-series mean \mathbf{\mu}_c
for each variable.
The mean and precision matrix therefore define a Gaussian Markov random field for \mathbf X
:
\mathrm{vec}(\mathbf X) \sim \mathrm{MVN}( \mathrm{vec}(\mathbf{I_T} \otimes \mathbf{\mu}), \mathbf{Q}^{-1})
Users can the specify
a distribution for measurement errors (or assume that variables are measured without error) using
argument family
. This defines the link-function g_c(.)
and distribution f_c(.)
for each time-series c
:
y_{t,c} \sim f_c( g_c^{-1}( x_{t,c} ), \theta_c )
dsem
then estimates all specified coefficients, time-series means \mu_c
, and distribution
measurement errors \theta_c
via maximizing a log-marginal likelihood, while
also estimating state-variables x_{t,c}
.
summary.dsem
then assembles estimates and standard errors in an easy-to-read format.
Standard errors for fixed effects (path coefficients, exogenoux variance parameters, and measurement error parameters)
are estimated from the matrix of second derivatives of the log-marginal likelihod,
and standard errors for random effects (i.e., missing or state-space variables) are estimated
from a generalization of this method (see sdreport
for details).
Latent variables
Any column \mathbf x_c
of tsdata
that includes only NA
values
represents a latent variable, and all others are called manifest variables.
The identifiability criteria for latent variables
can be complicated. To explain, we ignore lagged effects (only simultaneous paths)
and classify three types of latent variables:
any latent variable \mathbf F
that includes paths out from it
to manifest variables, but has no paths from manifest variables into \mathbf F
is a
factor variable. These are identifable by fixing their SD (i.e., at one), and using a
trimmed Cholesky parameterization (i.e., each successive factor includes fewer paths to
manifest variables). See the DFA vignette for an example. Factor latent variables
can be used to represent residual covariance while also estimating the source of
that covariance explicitly
Any latent variable \mathbf Y
that includes paths in
from some manifest variables \mathbf X
and some paths out to manifest variables
\mathbf Z
is an intermediate latent variable. In general, the at least one path
in or out must be fixed a priori (e.g., at one) to identify the scale of the intermediate
LV. These intermediate latent variables can represent ecological concepts that serve
as intermediate link between different manifest variables
Any latent variable \mathbf C
that includes paths in
from some manifest variables \mathbf X
and no paths out to manifest variables
is a composite latent variable. In general, you must fix all paths to composite variables
a priori, and must also fix the SD a priori (e.g., at zero). These composite variables
allow DSEM to estimate a response with standard errors that integrates
across multiple manifest variables
As stated, these criteria do not involve paths from one to another latent variable. These are also possible, but involve more complicated identifiability criteria.
When to do (ot not do) model selection
In general, DSEM can be used for predictive modelling and/or structural causal modelling.
For predictive modelling, DSEM provides an expressive
interface to specify any number of fixed effects and use these to represent the
covariance among variables and over time. The predictive error is expected to
decrease when using a parsimonious model, and model selection might be appropriate
using either stepwise_selection
or some manual rule for dropping
coefficients that are not statistically significant using a likelihood ratio or
Wald test.
However, structural causal modelling (SCM) is necessary for models to be transferable
to new environments (patterns of colinearity), or for counterfactual analysis.
In general, SCM does not involve using parsimony as a basis for model selection.
Instead, SCM structure should be defined based on ecological knowledge, and
models can be further elaborated using tests of directional separation
(see test_dsep
).
An object (list) of class dsem
. Elements include:
TMB object from MakeADFun
RAM parsed by make_dsem_ram
SEM structure parsed by make_dsem_ram
as intermediate description of model linkages
The list of inputs passed to MakeADFun
The output from nlminb
The output from sdreport
Objects useful for package function, i.e., all arguments passed during the call
Total time to run model
Introducing the package, its features, and comparison with other software (to cite when using dsem):
Thorson, J. T., Andrews, A., Essington, T., Large, S. (2024). Dynamic structural equation models synthesize ecosystem dynamics constrained by ecological mechanisms. Methods in Ecology and Evolution. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/2041-210X.14289")}
# Define model
sem = "
# Link, lag, param_name
cprofits -> consumption, 0, a1
cprofits -> consumption, 1, a2
pwage -> consumption, 0, a3
gwage -> consumption, 0, a3
cprofits -> invest, 0, b1
cprofits -> invest, 1, b2
capital -> invest, 0, b3
gnp -> pwage, 0, c2
gnp -> pwage, 1, c3
time -> pwage, 0, c1
"
# Load data
data(KleinI, package="AER")
TS = ts(data.frame(KleinI, "time"=time(KleinI) - 1931))
tsdata = TS[,c("time","gnp","pwage","cprofits",'consumption',
"gwage","invest","capital")]
# Fit model
fit = dsem( sem=sem,
tsdata = tsdata,
estimate_delta0 = TRUE,
control = dsem_control(quiet=TRUE) )
summary( fit )
plot( fit )
plot( fit, edge_label="value" )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.