gimmeSEM: Group iterative multiple model estimation.
In gimme: Group Iterative Multiple Model Estimation

gimmeSEM

R Documentation

Group iterative multiple model estimation.

Description

This function identifies structural equation models for each individual that consist of both group-level and individual-level paths.

Usage

gimmeSEM(data        = NULL,
         out         = NULL,
         sep         = NULL,
         header      = NULL,
         ar          = TRUE,
         plot        = TRUE,
         subgroup    = FALSE,
         sub_feature = "lag & contemp",
         sub_method = "Walktrap",
         sub_sim_thresh    = "lowest", 
         confirm_subgroup = NULL,
         paths       = NULL,
         exogenous = NULL,
         outcome   = NULL,
         conv_vars   = NULL,
         conv_length = 16, 
         conv_interval = 1,
         mult_vars   = NULL,
         mean_center_mult = FALSE,
         standardize = FALSE,
         groupcutoff = .75,
         subcutoff   = .75,
         diagnos     = FALSE, 
         ms_allow         = FALSE,
         ms_tol           = 1e-5,
         lv_model         = NULL, 
         lv_estimator     = "miiv",     
         lv_scores        = "regression",       
         lv_miiv_scaling  = "first.indicator", 
         lv_final_estimator = "miiv",
         lasso_model_crit    = NULL, 
         hybrid = FALSE,
         VAR = FALSE,
         dir_prop_cutoff =0,
         ordered = NULL,
         group_correct = "Bonferoni Group")

Arguments

`data`	The path to the directory where the data files are located, or the name of the list containing each individual's time series. Each file or matrix must contain one matrix for each individual containing a T (time) by p (number of variables) matrix where the columns represent variables and the rows represent time. Individuals must have the same variables (p) but can have different lengths of observations (T).
`out`	The path to the directory where the results will be stored (optional). If specified, a copy of output files will be replaced in directory. If directory at specified path does not exist, it will be created.
`sep`	The spacing of the data files. Follows R convention. "" indicates space-delimited, backslash "t" indicates tab-delimited, "," indicates comma delimited. Only necessary to specify if reading data in from physical directory.
`header`	Logical. Indicate TRUE for data files with a header. Only necessary to specify if reading data in from physical directory.
`ar`	Logical. If TRUE, begins search for group model with autoregressive (AR) paths freed for estimation. If ms_allow=TRUE, it is recommended to set ar=FALSE. Multiple solutions are unlikely to be found when ar=TRUE. Defaults to TRUE.
`plot`	Logical. If TRUE, graphs depicting relations among variables of interest will automatically be created. Solid lines represent contemporaneous relations (lag 0) and dashed lines reflect lagged relations (lag 1). For individual-level plots, red paths represent positive weights and blue paths represent negative weights. Width of paths corresponds to estimated path weight. For the group-level plot, black represents group-level paths, grey represents individual-level paths, and (if subgroup = TRUE) green represents subgroup-level paths. For the group-level plot, the width of the edge corresponds to the count. Defaults to TRUE.
`subgroup`	Logical. If TRUE, subgroups are generated based on similarities in model features using the `walktrap.community` function from the `igraph` package. When ms_allow=TRUE, subgroup should be set to FALSE. Defaults to FALSE.
`sub_feature`	Option to indicate feature(s) used to subgroup individuals. Defaults to "lag & contemp" for lagged and contemporaneous, which is the original method. Can use "lagged" or "contemp" to subgroup solely on features related to lagged and contemporaneous relations, respectively.
`sub_method`	Community detection method used to cluster individuals into subgroups. Options align with those available in the igraph package: "Walktrap" (default), "Infomap", "Louvain", "Edge Betweenness", "Label Prop", "Fast Greedy", "Leading Eigen", and "Spinglass".
`sub_sim_thresh`	Threshold for inducing sparsity in similarity matrix. Options are: the percent of edges in the similarity matrix to set to zero (e.g., .25 would set the lower quartile), "lowest" (default) subtracts the minimum value from all values, and "search" searches across thresholds to arrive at one providing highest modularity.
`confirm_subgroup`	Dataframe. Option only available when subgroup = TRUE. Dataframe should contain two columns. The first column should specify file labels (the name of the data files without file extension), and the second should contain integer values (beginning at 1) specifying the subgroup membership for each individual. function from the `igraph` package. Defaults to TRUE.
`paths`	`lavaan`-style syntax containing paths with which to begin model estimation (optional). That is, Y~X indicates that Y is regressed on X, or X predicts Y. Paths can also be set to a specific value for estimation using `lavaan`-style syntax (e.g., 'V4 ~ 0.5V3'), or set to 0 so that they will not be estimated (e.g., 'V4 ~ 0V3'). If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names. To reference lag variables, "lag" should be added to the end of the variable name with no separation. Defaults to NULL.
`exogenous`	Vector of variable names to be treated as exogenous (optional). That is, exogenous variable X can predict Y but cannot be predicted by Y. If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names. The default for exogenous variables is that lagged effects of the exogenous variables are not included in the model search. If lagged paths are wanted, "&lag" should be added to the end of the variable name with no separation. Defaults to NULL.
`outcome`	Vector of variable names to be treated as outcome (optional). This is a variable that can be predicted by others but cannot predict. If no header is used, then variables should be referred to with V followed (with no separation) by the column number. If a header is used, variables should be referred to using variable names.
`conv_vars`	Vector of variable names to be convolved via smoothed Finite Impulse Response (sFIR). Note, conv_vars are not not automatically considered exogenous variables. To treat conv_vars as exogenous use the exogenous argument. Variables listed in conv_vars must be binary variables. You cannot do lagged variables. If there is missing data in the endogenous variables their values will be imputed for the convolution operation only. Defaults to NULL.
`conv_length`	Expected response length in seconds. For functional MRI BOLD, 16 seconds (default) is typical for the hemodynamic response function.
`conv_interval`	Interval between data acquisition. Currently conv_length/conv_interval must be an integer. For fMRI studies, this is the repetition time. Defaults to 1.
`mult_vars`	Vector of variable names to be multiplied to explore bilinear/modulatory effects (optional). All multiplied variables will be treated as exogenous (X can predict Y but cannot be predicted by Y). Within the vector, multiplication of two variables should be indicated with an asterik (e.g. V1V2). If no header is used, variables should be referred to with V followed by the column number (with no separation). If a header is used, each variable should be referred to using variable names. If multiplication with the lag 1 of a variable is desired, the variable name should be followed by "lag" with no separation (e.g. V1V2lag).
`mean_center_mult`	Logical. If TRUE, the variables indicated in mult_vars will be mean-centered before being multiplied together. Defaults to FALSE.
`standardize`	Logical. If TRUE, all variables will be standardized to have a mean of zero and a standard deviation of one. Defaults to FALSE
`groupcutoff`	Cutoff value for group-level paths. Defaults to .75, indicating that a path must be significant across 75% of individuals to be included as a group-level path.
`subcutoff`	Cutoff value for subgroup- level paths. Defaults to .75, indicating that a path must be significant across at least 75% of the individuals in a subgroup to be considered a subgroup-level path.
`diagnos`	Logical. If TRUE provides internal output for diagnostic purposes. Defaults to FALSE.
`ms_allow`	Logical. If TRUE provides multiple solutions when more than one path has identical modification index values. When ms_allow=TRUE, it is recommended to set ar=FALSE. Multiple solutions are unlikely to be found when ar=TRUE. Additionally, subgroup should be set to FALSE. Output files for individuals with multiple solutions will represent the last solution found for the individual, not necessarily the best solution for the individual.
`ms_tol`	Precision used when evaluating similarity of modification indices when ms_allow = TRUE. We recommend that ms_tol not be greater than the default, especially when standardize=TRUE. Defaults to 1e-5.
`lv_model`	Invoke latent variable modeling by providing the measurement model syntax here. lavaan conventions are used for relating observed variables to factors. Defaults to NULL.
`lv_estimator`	Estimator used for factor analysis. Options are "miiv" (default), "pml" (pseudo-ML) or "svd".
`lv_scores`	Method used for estimating latent variable scores from parameters obtained from the factor analysis when lv_model is not NULL. Options are: "regression" (Default), "bartlett".
`lv_miiv_scaling`	Type of scaling indicator to use when "miiv" selected for lv_estimator. Options are "first.indicator" (Default; the first observed variable in the measurement equation is used), "group" (best one for the group), or "individual" (each individual has the best one for them according to R2).
`lv_final_estimator`	Estimator for final estimations. "miiv" (Default) or "pml" (pseudo-ML).
`lasso_model_crit`	When not null, invokes multiLASSO approach for the GIMME model search procedure. Arguments indicate the model selection criterion to use for model selection: 'bic' (select on BIC), 'aic', 'aicc', 'hqc', 'cv' (cross-validation).
`hybrid`	Logical. If TRUE, enables hybrid-VAR models where both directed contemporaneous paths and contemporaneous covariances among residuals are candidate relations in the search space. Defaults to FALSE.
`VAR`	Logical. If true, VAR models where contemporaneous covariances among residuals are candidate relations in the search space. Defaults to FALSE.
`dir_prop_cutoff`	Option to require that the directionality of a relation has to be higher than the reverse direction for a prespecified proportion of indivdiuals.
`ordered`	A character vector containing the names of all ordered categorical variables in the model.
`group_correct`	Indicate how to correct for multiple testing. "Bonferoni Group" (Default) corrects the alpha value for the number of people (N) in th sample; "Bonferoni Paths" corrects according to the number of eligible paths for that individual; a numeric <1 and >0 can be entered to indicate the alpha level desired.

Details

Output is a list of results if saved as an object and/or files printed to a directory if the "out" argument is used.

Value

A list with the following components:

data: list of data used in analyses. Contains lagged variables and any data manipulations done within gimme.
path_est_mats: N matrices of individual-level coefficient estimates for directed paths.
varnames: Variable names in order of data.
n_vars_total: total number of variables.
n_lagged: total nubmer of lagged varaibles.
n_endog: total number of endogenous variables.
fit: Final fit indices, R-squared for each variable, convergence status, subgroup membership (if applicable), and modularity (if applicable).
path_se_est: Matrix of all individuals' unstandardized & standardized coefficient estimates, standard errors, level of each relation (e.g., "group"), and subgroup membership.
plots: If number of variables >3, N individual-level plots of directed lagged and contemporaneous relations among variables. Red = high / hot / positive values; Blue = low / cold / negative values. Line width corresponds with absolute value of beta estimate. Use plot() function.
plots_cov: If number of variables >3 and hybrid = TRUE or VAR = TRUE, N plots of contemporaneous covariances among residuals.
group_plot_paths: If number of variables >3, aggregated plot of directed relations. Black = group-level, Green = subgroup level, Grey = individual level. Line width corresponds with percent of individuals who have that path estimated.
group_plot_cov: If number of variables >3 and hybrid = TRUE or VAR = TRUE, aggregated plot of covariances among residuals.
sub_plots_paths: Aggregated directed subgroup plots for K subgroups if applicable.
sub_plots_cov: Aggregated covariance subgroup plots for K subgroups if applicable.
path_counts: Matrix containing counts of the number of people for whom a given directed relation is estimated.
path_counts_sub: K matrices containing counts of the number of people within that subgroup for whom a given directed relation is estimated.
cov_counts: Matrix containing counts of the number of people for whom a given covariance relation is estimated.
cov_counts_sub: K matrices containing counts of the number of people within that subgroup for whom a given covariance relation is estimated.
vcov: N matrices containing the estimated covariance matrix of paramaters of interest in gimme.
vcovfull: N matrices of the full estimated covariance matrix of parameters.
psi: N standardized residual covariance matrices.
ps_unstd: N unstandardied residual covariance matrices.
sim_matrix: If subgroup = TRUE, similarity count matrix of how many edges are in commong among each pair of individuals after the group-level search (also considers individual-level paths that may be added later via the EPC).
syntax: N individual slices containing lavaan-style syntax.
lvgimme: If provided, the latent variable model syntax (also included in the above).
rf_est: If variables to convolve are provided in conv_vars, the N response function estimates for individuals.
arguments: List of arguments provided by the user.

Author(s)

Zachary Fisher, Kathleen Gates, & Stephanie Lane

References

Gates, K.M. & Molenaar, P.C.M. (2012). Group search algorithm recovers effective connectivity maps for individuals in homogeneous and heterogeneous samples. NeuroImage, 63, 310-319.

Lane, S.T. & Gates, K.M. (2017). Automated selection of robust individual-level structural equation models for time series data. Structural Equation Modeling.

Adriene M. Beltz & Peter C. M. Molenaar (2016) Dealing with Multiple Solutions in Structural Vector Autoregressive Models, Multivariate Behavioral Research, 51:2-3, 357-373.

Examples

 ## Not run: 
paths <- 'V2 ~ V1
          V3 ~ V4lag'

fit <- gimmeSEM(data     = simData,
                out      = "C:/simData_out",
                subgroup = TRUE, 
                paths    = paths)

print(fit, mean = TRUE)
print(fit, subgroup = 1, mean = TRUE)
print(fit, file = "group_1_1", estimates = TRUE)
print(fit, subgroup = 2, fitMeasures = TRUE)
plot(fit, file = "group_1_1")
 
## End(Not run)

gimme documentation built on June 23, 2025, 5:08 p.m.