mlVAR: Multilevel VAR Estimation for Multiple Time Series

View source: R/mlVAR.R

mlVARR Documentation

Multilevel VAR Estimation for Multiple Time Series

Description

The function mlVAR computes estimates of the multivariate vector autoregression model. This model returns three stuctures: temporal effects (e.g., lag-1 regression weights), contemporaneous relationships (correlations or partial correlations) and between-subject effects (correlations and partial correlations). See details.

The predict method computes fitted values and residuals from the temporal model (step 1). It returns predictions aligned to the original input data, with NA for rows that cannot be predicted (e.g., the first observation per person-day lost to lagging, or rows with missing data). When newdata is supplied, predictions are computed for the new data using the fitted model(s). For estimator = "lmer", new subjects are predicted using fixed effects only (allow.new.levels = TRUE); for estimator = "lm", prediction is only possible for subjects seen during training.

The residuals method is a convenience wrapper around predict that returns only the residuals.

The resimulate method generates simulated data from a fitted mlVAR model using person-specific estimated parameters (fixed + random effects). For each person, a VAR time series is simulated using their estimated temporal effects (Beta), contemporaneous precision matrix (Theta), and person-specific means (mu). The output has the same dimensions and missingness pattern as the original input data, making it suitable for posterior predictive checks and model diagnostics.

Usage

mlVAR(data, vars, idvar, lags = 1, dayvar, beepvar,
                 estimator = c("default", "lmer", "lm","Mplus"),
                 contemporaneous = c("default", "correlated",
                 "orthogonal", "fixed", "unique"), temporal =
                 c("default", "correlated", "orthogonal", "fixed",
                 "unique"), nCores = 1, verbose = TRUE, compareToLags,
                 scale = TRUE, scaleWithin = FALSE, AR = FALSE,
                 MplusSave = TRUE, MplusName = "mlVAR", iterations = "(2000)",
                 chains = nCores, signs, orthogonal,
                 trueMeans, na.rm = TRUE,
                 full_detrend = FALSE
  )

## S3 method for class 'mlVAR'
predict(object, newdata, scale_back = TRUE,
        include_ids = TRUE, ...)

## S3 method for class 'mlVAR'
residuals(object, scale_back = TRUE,
        include_ids = TRUE, ...)

resimulate(object, ...)

## S3 method for class 'mlVAR'
resimulate(object, scale_back = TRUE,
        include_ids = TRUE, nTime = NULL,
        keep_missing = TRUE,
        variance = c("model", "empirical"), ...)

Arguments

data

Data frame

vars

Vectors of variables to include in the analysis

idvar

String indicating the subject ID

lags

Vector indicating the lags to include

dayvar

String indicating assessment day. Adding this argument makes sure that the first measurement of a day is not regressed on the last measurement of the previous day. IMPORTANT: only add this if the data has multiple observations per day.

beepvar

Optional string indicating assessment beep per day. Adding this argument will cause non-consecutive beeps to be treated as missing!

estimator

The estimator to be used. "lmer" for sequential univariate multi-level estimation, "Mplus" for multivariate Bayesian estimation (requires Mplus), and "lm" for fixed effects estimation.

contemporaneous

How should the contemporaneous networks be estimated? These networks are always estimated post-hoc by investigating the residuals of the temporal models. "correlated" and "orthogonal" run second multi-level models in which the networks are estimated using node-wise estimation. "fixed" and "unique" simply correlate the residuals, either by computing one network for all subjects (fixed) or a single network per per subject.

temporal

How should the temporal effects be estimated? "correlated" estimates correlated random effects, "orthogonal" estimates non-correlated random effects and "fixed" estimates a model in which only the intercept is random. Defaults to "correlated" when the number of variables is less than 6 and "orthogonal" otherwise. "unique" uses lm to estimate an unique model for each subject.

nCores

Number of cores to use in computation

verbose

Logical indicating if console messages and the progress bar should be shown.

scale

Logical, should variables be standardized before estimation?

scaleWithin

Logial, should variables be scaled within-person (set to FALSE to only center within-person)

compareToLags

A vector indicating which lags to base the data on. If the model is to be compared with a model with multiple lags using mlVARcompare, this argument must be used to make sure the number of observations is the same in both models (e.g., a lag 1 model can model the second observation of a day and a lag-2 model can't, causing different number of observations and incomparable models). It is suggested to not use this argument unless you want to compare models, and always run mlVAR without using this argument afterwards in the selected model.

AR

Logical, should an auto-regression only model be fitted?

MplusSave

Logical, should the Mplus model file and output be saved?

MplusName

Name of the Mplus model file and output (without extensions)

iterations

The string used to define the number of iterations in Mplus

chains

Number of Mplus chains

signs

Optional matrix fixing the signs of contemporaneous correlations. Is estimated by running mlVAR with estimator = "lmer" if missing.

orthogonal

Deprecated argument only added for backward competability. Ignore.

trueMeans

Optional data frame with true person-wise means to use instead of estimated person means. Must contain the idvar column and all vars columns.

na.rm

Logical, should missing values be removed when checking for rank deficiency and when removing incomplete cases? Defaults to TRUE.

full_detrend

Logical. If TRUE, standardize each variable per observation position across all subjects before estimation. For example, all first measurements are standardized together, all second measurements together, etc. This removes systematic occasion effects (e.g., time-of-day trends in ESM data). Requires a balanced design (equal number of observations per subject). Applied before grand-mean standardization (scale). Defaults to FALSE.

object

An mlVAR object (for predict, residuals, and resimulate methods).

newdata

Optional data frame with the same structure as the training data. Must contain the idvar, dayvar, beepvar, and all vars columns. If supplied, predictions are computed for this new data using the fitted model. For estimator = "lmer", new subjects (not seen during training) receive fixed-effects-only predictions. For estimator = "lm", prediction is only available for subjects in the training data.

scale_back

Logical. If TRUE (default) and the model was fitted with scale = TRUE, values are back-transformed to the original scale of the data. If FALSE, values are returned on the standardized (model) scale. Applies to predict, residuals, and resimulate.

include_ids

Logical. If TRUE (default), the idvar, dayvar, and beepvar columns are prepended as columns in the returned data frame. Applies to predict, residuals, and resimulate. When nTime is specified in resimulate, only the idvar column is included.

nTime

Optional integer specifying the number of time points to simulate per person. When NULL (default), the simulated data has the same number of rows as the original input data. When specified, each person gets exactly nTime time points, and the output has nIDs * nTime rows with the idvar and variable columns only (no dayvar or beepvar).

keep_missing

Logical. If TRUE (default) and nTime = NULL, the original variable-level missingness pattern is applied to the simulated data: any cell that was NA in the original data is set to NA in the output. If FALSE, all valid (non-structural-NA) positions receive simulated values. Ignored when nTime is specified.

variance

Character string specifying how the innovation covariance matrix is obtained. "model" (default) uses the model-implied covariance derived from the estimated contemporaneous precision matrix (person-specific GGM structure, population-level residual scale). "empirical" uses the model's partial correlation structure (from the person-specific precision matrix) but scales it by the empirical standard deviations of each person's residuals, i.e., SD %*% cov2cor(solve(kappa_i)) %*% SD. Falls back to "model" for persons with too few residuals.

...

Additional arguments (currently unused).

Details

This function estimates the multi-level VAR model to obtain temporal, contemporaneous and between-subject effects using nodewise estimation. Temporal and between-subject effects are obtained directly from the models and contemporaneous effects are estimated post-hoc by correlating the residuals. See arxiv.org/abs/1609.04156 for details.

Setting estimator = "Mplus" will generate a Mplus model, run the analysis and read the results into R. Mplus 8 is required for this estimation. It is recommended to set contemporaneous = "fixed", though not required. For the estimation of contemporaneous random effects, the signs of contemporaneous *correlations * (not partial correlations) need be set (or estimated) via the signs argument.

Value

mlVAR returns an mlVAR object.

predict.mlVAR returns a data frame of predicted (fitted) values with the same number of rows as the original input data (or newdata). Rows that could not be predicted (e.g., first observation per person-day lost to lagging, or rows with missing data) contain NA. If include_ids = TRUE, the idvar, dayvar, and beepvar columns are prepended.

residuals.mlVAR returns a data frame of residuals with the same structure as predict.mlVAR.

resimulate.mlVAR returns a data frame of simulated values. By default (when nTime = NULL), it has the same number of rows as the original input data, with NAs in the same positions as the original data (unless keep_missing = FALSE). If include_ids = TRUE, the idvar, dayvar, and beepvar columns are prepended. When nTime is specified, the output has nIDs * nTime rows with the idvar column and variable columns only. Person-specific stationary means are computed from the model parameters, and innovation covariances are derived from the estimated contemporaneous precision matrices.

Author(s)

Sacha Epskamp (mail@sachaepskamp.com)

References

Bringmann, L. F., Vissers, N., Wichers, M., Geschwind, N., Kuppens, P., Peeters, F., ... & Tuerlinckx, F. (2013). A network approach to psychopathology: New insights into clinical longitudinal data. PloS one, 8(4), e60188.

Hamaker, E. L., & Grasman, R. P. (2014). To center or not to center? Investigating inertia with a multilevel autoregressive model. Frontiers in psychology, 5.

Epskamp, S., Waldorp, L. J., Mottus, R., & Borsboom, D. (2017). Discovering Psychological Dynamics: The Gaussian Graphical Model in Cross-sectional and Time-series Data. arxiv.org/abs/1609.04156.

See Also

mlVARcompare, summary.mlVAR, plot.mlVAR, mlGGM

Examples

## Not run: 
### Small example ###
# Simulate data:
Model <- mlVARsim(nPerson = 50, nNode = 3, nTime = 50, lag=1)

# Estimate using correlated random effects:
fit1 <- mlVAR(Model$Data, vars = Model$vars, idvar = Model$idvar, lags = 1, temporal = "correlated")

# Print some pointers:
print(fit1)

# Summary of all parameter estimates:
summary(fit1)

# Compare temporal relationships:
layout(t(1:2))
plot(Model, "temporal", title = "True temporal relationships", layout = "circle")
plot(fit1, "temporal", title = "Estimated temporal relationships", layout = "circle")

# Compare contemporaneous partial correlations:
layout(t(1:2))
plot(Model, "contemporaneous", title = "True contemporaneous relationships", 
    layout = "circle")
plot(fit1, "contemporaneous", title = "Estimated contemporaneous relationships", 
    layout = "circle")

# Compare between-subjects partial correlations:
layout(t(1:2))
plot(Model, "between", title = "True between-subjects relationships", layout = "circle")
plot(fit1, "between", title = "Estimated between-subjects relationships",
    layout = "circle")

# Predictions and residuals (same number of rows as input data):
pred <- predict(fit1)
res <- residuals(fit1)
dim(pred)
dim(res)

# Resimulate data from the fitted model (posterior predictive check):
sim <- resimulate(fit1)
dim(sim)  # same dimensions as original data

# Resimulate with custom time series length (e.g., longer series):
sim_long <- resimulate(fit1, nTime = 500)
dim(sim_long)  # nIDs * 500 rows

# Resimulate without applying original missingness pattern:
sim_complete <- resimulate(fit1, keep_missing = FALSE)
sum(is.na(sim_complete))  # only structural NAs (day/beep boundaries)

# Resimulate with empirical innovation variances (model partial correlations,
# scaled by person-specific empirical residual SDs):
sim_emp <- resimulate(fit1, variance = "empirical")
dim(sim_emp)

# Run same model with non-correlated temporal relationships and fixed-effect model:
fit2 <- mlVAR(Model$Data, vars = Model$vars, idvar = Model$idvar, lags = 1, 
    temporal = "orthogonal")
fit3 <- mlVAR(Model$Data, vars = Model$vars, idvar = Model$idvar, lags = 1, 
    temporal = "fixed")

# Compare models:
mlVARcompare(fit1,fit2,fit3)

# Inspect true parameter correlation matrix:
Model$model$Omega$cor$mean
# Even though correlations are high, orthogonal model works well often!


### Large example ###
Model <- mlVARsim(nPerson = 100, nNode = 10, nTime = 100,lag=1)

# Correlated random effects no longer practical. Use orthogonal or fixed:
fit4 <- mlVAR(Model$Data, vars = Model$vars, idvar = Model$idvar, lags = 1, 
    temporal = "orthogonal")
fit5 <- mlVAR(Model$Data, vars = Model$vars, idvar = Model$idvar, lags = 1, 
    temporal = "fixed")

# Compare models:
mlVARcompare(fit4, fit5)

# Compare temporal relationships:
layout(t(1:2))
plot(Model, "temporal", title = "True temporal relationships", layout = "circle")
plot(fit4, "temporal", title = "Estimated temporal relationships", layout = "circle")

# Compare contemporaneous partial correlations:
layout(t(1:2))
plot(Model, "contemporaneous", title = "True contemporaneous relationships", 
    layout = "circle")
plot(fit4, "contemporaneous", title = "Estimated contemporaneous relationships", 
    layout = "circle")

# Compare between-subjects partial correlations:
layout(t(1:2))
plot(Model, "between", title = "True between-subjects relationships", layout = "circle")
plot(fit4, "between", title = "Estimated between-subjects relationships", 
    layout = "circle")

## End(Not run)

mlVAR documentation built on March 17, 2026, 5:06 p.m.

Related to mlVAR in mlVAR...