gimmeSEM: Group iterative multiple model estimation.

gimmeSEMR Documentation

Group iterative multiple model estimation.

Description

This function identifies structural equation models for each individual that consist of both group-level and individual-level paths.

Usage

gimmeSEM(data        = NULL,
         out         = NULL,
         sep         = NULL,
         header      = NULL,
         ar          = TRUE,
         plot        = TRUE,
         subgroup    = FALSE,
         sub_feature = "lag & contemp",
         sub_method = "Walktrap",
         sub_sim_thresh    = "lowest", 
         confirm_subgroup = NULL,
         paths       = NULL,
         exogenous = NULL,
         outcome   = NULL,
         conv_vars   = NULL,
         conv_length = 16, 
         conv_interval = 1,
         mult_vars   = NULL,
         mean_center_mult = FALSE,
         standardize = FALSE,
         groupcutoff = .75,
         subcutoff   = .75,
         diagnos     = FALSE, 
         ms_allow         = FALSE,
         ms_tol           = 1e-5,
         lv_model         = NULL, 
         lv_estimator     = "miiv",     
         lv_scores        = "regression",       
         lv_miiv_scaling  = "first.indicator", 
         lv_final_estimator = "miiv",
         lasso_model_crit    = NULL, 
         hybrid = FALSE,
         VAR = FALSE,
         dir_prop_cutoff =0,
         ordered = NULL)

Arguments

data

The path to the directory where the data files are located, or the name of the list containing each individual's time series. Each file or matrix must contain one matrix for each individual containing a T (time) by p (number of variables) matrix where the columns represent variables and the rows represent time. Individuals must have the same variables (p) but can have different lengths of observations (T).

out

The path to the directory where the results will be stored (optional). If specified, a copy of output files will be replaced in directory. If directory at specified path does not exist, it will be created.

sep

The spacing of the data files. Follows R convention. "" indicates space-delimited, backslash "t" indicates tab-delimited, "," indicates comma delimited. Only necessary to specify if reading data in from physical directory.

header

Logical. Indicate TRUE for data files with a header. Only necessary to specify if reading data in from physical directory.

ar

Logical. If TRUE, begins search for group model with autoregressive (AR) paths freed for estimation. If ms_allow=TRUE, it is recommended to set ar=FALSE. Multiple solutions are unlikely to be found when ar=TRUE. Defaults to TRUE.

plot

Logical. If TRUE, graphs depicting relations among variables of interest will automatically be created. Solid lines represent contemporaneous relations (lag 0) and dashed lines reflect lagged relations (lag 1). For individual-level plots, red paths represent positive weights and blue paths represent negative weights. Width of paths corresponds to estimated path weight. For the group-level plot, black represents group-level paths, grey represents individual-level paths, and (if subgroup = TRUE) green represents subgroup-level paths. For the group-level plot, the width of the edge corresponds to the count. Defaults to TRUE.

subgroup

Logical. If TRUE, subgroups are generated based on similarities in model features using the walktrap.community function from the igraph package. When ms_allow=TRUE, subgroup should be set to FALSE. Defaults to FALSE.

sub_feature

Option to indicate feature(s) used to subgroup individuals. Defaults to "lag & contemp" for lagged and contemporaneous, which is the original method. Can use "lagged" or "contemp" to subgroup solely on features related to lagged and contemporaneous relations, respectively.

sub_method

Community detection method used to cluster individuals into subgroups. Options align with those available in the igraph package: "Walktrap" (default), "Infomap", "Louvain", "Edge Betweenness", "Label Prop", "Fast Greedy", "Leading Eigen", and "Spinglass".

sub_sim_thresh

Threshold for inducing sparsity in similarity matrix. Options are: the percent of edges in the similarity matrix to set to zero (e.g., .25 would set the lower quartile), "lowest" (default) subtracts the minimum value from all values, and "search" searches across thresholds to arrive at one providing highest modularity.

confirm_subgroup

Dataframe. Option only available when subgroup = TRUE. Dataframe should contain two columns. The first column should specify file labels (the name of the data files without file extension), and the second should contain integer values (beginning at 1) specifying the subgroup membership for each individual. function from the igraph package. Defaults to TRUE.

paths

lavaan-style syntax containing paths with which to begin model estimation (optional). That is, Y~X indicates that Y is regressed on X, or X predicts Y. Paths can also be set to a specific value for estimation using lavaan-style syntax (e.g., 'V4 ~ 0.5*V3'), or set to 0 so that they will not be estimated (e.g., 'V4 ~ 0*V3'). If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names. To reference lag variables, "lag" should be added to the end of the variable name with no separation. Defaults to NULL.

exogenous

Vector of variable names to be treated as exogenous (optional). That is, exogenous variable X can predict Y but cannot be predicted by Y. If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names. The default for exogenous variables is that lagged effects of the exogenous variables are not included in the model search. If lagged paths are wanted, "&lag" should be added to the end of the variable name with no separation. Defaults to NULL.

outcome

Vector of variable names to be treated as outcome (optional). This is a variable that can be predicted by others but cannot predict. If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names.

conv_vars

Vector of variable names to be convolved via smoothed Finite Impulse Response (sFIR). Note, conv_vars are not not automatically considered exogenous variables. To treat conv_vars as exogenous use the exogenous argument. Variables listed in conv_vars must be binary variables. You cannot do lagged variables. If there is missing data in the endogenous variables their values will be imputed for the convolution operation only. Defaults to NULL.

conv_length

Expected response length in seconds. For functional MRI BOLD, 16 seconds (default) is typical for the hemodynamic response function.

conv_interval

Interval between data acquisition. Currently conv_length/conv_interval must be an integer. For fMRI studies, this is the repetition time. Defaults to 1.

mult_vars

Vector of variable names to be multiplied to explore bilinear/modulatory effects (optional). All multiplied variables will be treated as exogenous (X can predict Y but cannot be predicted by Y). Within the vector, multiplication of two variables should be indicated with an asterik (e.g. V1*V2). If no header is used, variables should be referred to with V followed by the column number (with no separation). If a header is used, each variable should be referred to using variable names. If multiplication with the lag 1 of a variable is desired, the variable name should be followed by "lag" with no separation (e.g. V1*V2lag).

mean_center_mult

Logical. If TRUE, the variables indicated in mult_vars will be mean-centered before being multiplied together. Defaults to FALSE.

standardize

Logical. If TRUE, all variables will be standardized to have a mean of zero and a standard deviation of one. Defaults to FALSE.

groupcutoff

Cutoff value for group-level paths. Defaults to .75, indicating that a path must be significant across 75% of individuals to be included as a group-level path.

subcutoff

Cutoff value for subgroup- level paths. Defaults to .75, indicating that a path must be significant across at least 75% of the individuals in a subgroup to be considered a subgroup-level path.

diagnos

Logical. If TRUE provides internal output for diagnostic purposes. Defaults to FALSE.

ms_allow

Logical. If TRUE provides multiple solutions when more than one path has identical modification index values. When ms_allow=TRUE, it is recommended to set ar=FALSE. Multiple solutions are unlikely to be found when ar=TRUE. Additionally, subgroup should be set to FALSE. Output files for individuals with multiple solutions will represent the last solution found for the individual, not necessarily the best solution for the individual.

ms_tol

Precision used when evaluating similarity of modification indices when ms_allow = TRUE. We recommend that ms_tol not be greater than the default, especially when standardize=TRUE. Defaults to 1e-5.

lv_model

Invoke latent variable modeling by providing the measurement model syntax here. lavaan conventions are used for relating observed variables to factors. Defaults to NULL.

lv_estimator

Estimator used for factor analysis. Options are "miiv" (default), "pml" (pseudo-ML) or "svd".

lv_scores

Method used for estimating latent variable scores from parameters obtained from the factor analysis when lv_model is not NULL. Options are: "regression" (Default), "bartlett".

lv_miiv_scaling

Type of scaling indicator to use when "miiv" selected for lv_estimator. Options are "first.indicator" (Default; the first observed variable in the measurement equation is used), "group" (best one for the group), or "individual" (each individual has the best one for them according to R2).

lv_final_estimator

Estimator for final estimations. "miiv" (Default) or "pml" (pseudo-ML).

lasso_model_crit

When not null, invokes multiLASSO approach for the GIMME model search procedure. Arguments indicate the model selection criterion to use for model selection: 'bic' (select on BIC), 'aic', 'aicc', 'hqc', 'cv' (cross-validation).

hybrid

Logical. If TRUE, enables hybrid-VAR models where both directed contemporaneous paths and contemporaneous covariances among residuals are candidate relations in the search space. Defaults to FALSE.

VAR

Logical. If true, VAR models where contemporaneous covariances among residuals are candidate relations in the search space. Defaults to FALSE.

dir_prop_cutoff

Option to require that the directionality of a relation has to be higher than the reverse direction for a prespecified proportion of indivdiuals.

ordered

A character vector containing the names of all ordered categorical variables in the model.

Details

In main output directory:

  • indivPathEstimates Contains estimate, standard error, p-value, and z-value for each path for each individual. If subgroup = TRUE and subgroups are found, then a column is present containing the subgroup membership for each individual. Also contains the level at which each path was estimated: group, subgroup, or individual.

  • summaryFit Contains model fit information for individual- level models. If subgroups are requested, this file also contains the subgroup membership for each individual.

  • summaryPathCountMatrix Contains counts of total number of paths, both contemporaneous and lagged, estimated for the sample. The row variable is the outcome and the column variable is the predictor variable.

  • summaryPathCounts Contains summary count information for paths identified at the group-, subgroup (if subgroup = TRUE), and individual-level.

  • summaryPathsPlot Produced if plot = TRUE. Contains figure with group, subgroup (if subgroup = TRUE), and individual-level paths for the sample. Black paths are group-level, green paths are subgroup-level, and grey paths are individual-level, where the thickness of the line represents the count.

In subgroup output directory (if subgroup = TRUE):

  • subgroupkPathCounts Contains counts of relations among lagged and contemporaneous variables for the kth subgroup.

  • subgroupkPlot Contains plot of group, subgroup, and individual level paths for the kth subgroup. Black represents group-level paths, grey represents individual-level paths, and green represents subgroup-level paths.

Note: if a subgroup of size n = 1 is discovered, subgroup-level output is not produced.
In individual output directory (where id represents the original file name for each individual):

  • idBetas Contains individual-level estimates of each path for each individual.

  • idStdErrors Contains individual-level standard errors for each path for each individual.

  • idEstRF Produced if conv_vars is not NULL. Contains individual-level estimated response function (e.g., hemodynamic response function (HRF) or relevant response function). One column for each convolved variable, output length is equal to conv_length input.

  • idPlot Contains individual-level plots. Red paths represent positive weights and blue paths represent negative weights.

Author(s)

Zachary Fisher, Kathleen Gates, & Stephanie Lane

References

Gates, K.M. & Molenaar, P.C.M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. NeuroImage, 63, 310-319.

Lane, S.T. & Gates, K.M. (2017). Automated selection of robust individual-level structural equation models for time series data. Structural Equation Modeling.

Adriene M. Beltz & Peter C. M. Molenaar (2016) Dealing with Multiple Solutions in Structural Vector Autoregressive Models, Multivariate Behavioral Research, 51:2-3, 357-373.

Examples

 ## Not run: 
paths <- 'V2 ~ V1
          V3 ~ V4lag'

fit <- gimmeSEM(data     = simData,
                out      = "C:/simData_out",
                subgroup = TRUE, 
                paths    = paths)

print(fit, mean = TRUE)
print(fit, subgroup = 1, mean = TRUE)
print(fit, file = "group_1_1", estimates = TRUE)
print(fit, subgroup = 2, fitMeasures = TRUE)
plot(fit, file = "group_1_1")
 
## End(Not run)

gimme documentation built on Aug. 30, 2023, 1:08 a.m.