# corrsys: Generate Correlated Systems of Equations with Ordinal,... In SimRepeat: Simulation of Correlated Systems of Equations with Multiple Variable Types

## Description

This function generates a correlated system of `M` equations representing a system of repeated measures at `M` time points. The equations may contain 1) ordinal (r ≥q 2 categories), continuous (normal, non-normal, and mixture distributions), count (regular and zero-inflated, Poisson and Negative Binomial) independent variables X; 2) continuous error terms E; 3) a discrete time variable Time; and 4) random effects U. The assumptions are that 1) there are at least 2 equations, 2) the independent variables, random effect terms, and error terms are uncorrelated, 3) each equation has an error term, 4) all error terms have a continuous non-mixture distribution or all have a continuous mixture distribution, 5) all random effects are continuous, and 6) growth is linear (with respect to time). The random effects may be a random intercept, a random slope for time, or a random slope for any of the X variables. Continuous variables are simulated using either Fleishman's third-order (`method = "Fleishman"`, doi: 10.1007/BF02293811) or Headrick's fifth-order (`method = "Polynomial"`, doi: 10.1016/S0167-9473(02)00072-5) power method transformation (PMT). Simulation occurs at the component-level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. The X terms can be the same across equations (i.e., modeling sex or height) or may be time-varying covariates. The equations may contain different numbers of X terms (i.e., a covariate could be missing for a given equation).

The outcomes Y are generated using a hierarchical linear models (HLM) approach, which allows the data to be structured in at least two levels. Level-1 is the repeated measure (time or condition) and other subject-level variables. Level-1 is nested within Level-2, which describes the average of the outcome (the intercept) and growth (slope for time) as a function of group-level variables. The first level captures the within-subject variation, while the second level describes the between-subjects variability. Using a HLM provides a way to determine if: a) subjects differ at a specific time point with respect to the dependent variable, b) growth rates differ across conditions, or c) growth rates differ across subjects. Random effects describe deviation at the subject-level from the average (fixed) effect described by the slope coefficients (betas). See the The Hierarchical Linear Models Approach for a System of Correlated Equations with Multiple Variable Types vignette for a description of the HLM model. The user can specify subject-level X terms, and each subject-level X term is crossed with all group-level X terms. The equations may also contain interactions between X variables. Interactions specified in `int.var` between two group-level covariates are themselves considered group-level covariates and will be crossed with subject-level covariates. Interactions between two subject-level covariates are considered subject-level covariates and will be crossed with group-level covariates. Since `Time` is a subject-level variable, each group-level term is crossed with `Time` unless otherwise specified.

Random effects may be added for the intercept, time slope, or effects of any of the covariates. The type of random intercept and time slope (i.e., non-mixture or mixture) is specified in `rand.int` and `rand.tsl`. This type may vary by equation. The random effects for independent variables are specified in `rand.var` and may also contain a combination of non-mixture and mixture continuous distributions. If the parameter lists are of length `M + 1`, the random effects are the same variables across equations and the correlation for the effects `corr.u` is a matrix. If the parameter lists are of length `2 * M`, the random effects are different variables across equations and the correlation for the effects `corr.u` is a list.

The independent variables, interactions, `Time` effect, random effects, and error terms are summed together to produce the outcomes Y. The beta coefficients may be the same or differ across equations. The user specifies the betas for the independent variables in `betas`, for the interactions between two group-level or two subject-level covariates in `betas.int`, for the group-subject level interactions in `betas.subj`, and for the `Time` interactions in `betas.tint`. Setting a coefficient to 0 will eliminate that term. The user also provides the correlations 1) between E terms; 2) between X variables within each outcome Y_p, `p = 1, ..., M`, and between outcome pairs; and 3) between U variables within each outcome Y_p, `p = 1, ..., M`, and between outcome pairs. The order of the independent variables in `corr.x` must be 1st ordinal (same order as in `marginal`), 2nd continuous non-mixture (same order as in `skews`), 3rd components of continuous mixture (same order as in `mix_pis`), 4th regular Poisson, 5th zero-inflated Poisson (same order as in `lam`), 6th regular NB, and 7th zero-inflated NB (same order as in `size`). The order of the random effects in `corr.u` must be 1st random intercept, 2nd random time slope, 3rd continuous non-mixture random effects, and 4th components of continuous mixture random effects.

The variables are generated from multivariate normal variables with intermediate correlations calculated using `intercorr`, which employs correlation method 1. See `SimCorrMix` for a description of the correlation method and the techniques used to generate each variable type. The order of the variables returned is 1st covariates X (as specified in `corr.x`), 2nd group-group or subject-subject interactions (ordered as in `int.var`), 3rd subject-group interactions (1st by subject-level variable as specified in `subj.var`, 2nd by covariate as specified in `corr.x`), and 4th time interactions (either as specified in `corr.x` with group-level covariates or in `tint.var`).

This function contains no parameter checks in order to decrease simulation time. That should be done first using `checkpar`. Summaries of the system can be obtained using `summary_sys`. The Correlated Systems of Statistical Equations with Multiple Variable Types demonstrates examples.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```corrsys(n = 10000, M = NULL, Time = NULL, method = c("Fleishman", "Polynomial"), error_type = c("non_mix", "mix"), means = list(), vars = list(), skews = list(), skurts = list(), fifths = list(), sixths = list(), Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(), mix_skews = list(), mix_skurts = list(), mix_fifths = list(), mix_sixths = list(), mix_Six = list(), marginal = list(), support = list(), lam = list(), p_zip = list(), size = list(), prob = list(), mu = list(), p_zinb = list(), corr.x = list(), corr.e = NULL, same.var = NULL, subj.var = NULL, int.var = NULL, tint.var = NULL, betas.0 = NULL, betas = list(), betas.subj = list(), betas.int = list(), betas.t = NULL, betas.tint = list(), rand.int = c("none", "non_mix", "mix"), rand.tsl = c("none", "non_mix", "mix"), rand.var = NULL, corr.u = list(), seed = 1234, use.nearPD = TRUE, eigmin = 0, adjgrad = FALSE, B1 = NULL, tau = 0.5, tol = 0.1, steps = 100, msteps = 10, nrand = 1e+05, errorloop = FALSE, epsilon = 0.001, maxit = 1000, quiet = FALSE) ```

## Arguments

 `n` the sample size (i.e. the length of each simulated variable; default = 10000) `M` the number of dependent variables Y (outcomes); equivalently, the number of equations in the system `Time` a vector of values to use for time; each subject receives the same time value; if `NULL`, `Time = 1:M` `method` the PMT method used to generate all continuous variables, including independent variables (covariates), error terms, and random effects; "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation `error_type` "non_mix" if all error terms have continuous non-mixture distributions, "mix" if all error terms have continuous mixture distributions `means` if no random effects, a list of length `M` where `means[[p]]` contains a vector of means for the continuous independent variables in equation p with non-mixture (X_{cont}) or mixture (X_{mix}) distributions and for the error terms (E); order in vector is X_{cont}, X_{mix}, E if there are random effects, a list of length `M + 1` if the effects are the same across equations or `2 * M` if they differ; where `means[M + 1]` or `means[(M + 1):(2 * M)]` are vectors of means for all random effects with continuous non-mixture or mixture distributions; order in vector is 1st random intercept U_0 (if `rand.int != "none"`), 2nd random time slope U_1 (if `rand.tsl != "none"`), 3rd other random slopes with non-mixture distributions U_{cont}, 4th other random slopes with mixture distributions U_{mix} `vars` a list of same length and order as `means` containing vectors of variances for the continuous variables, error terms, and any random effects `skews` if no random effects, a list of length `M` where `skews[[p]]` contains a vector of skew values for the continuous independent variables in equation p with non-mixture (X_{cont}) distributions and for E if `error_type = "non_mix"`; order in vector is X_{cont}, E if there are random effects, a list of length `M + 1` if the effects are the same across equations or `2 * M` if they differ; where `skews[M + 1]` or `skews[(M + 1):(2 * M)]` are vectors of skew values for all random effects with continuous non-mixture distributions; order in vector is 1st random intercept U_0 (if `rand.int = "non_mix"`), 2nd random time slope U_1 (if `rand.tsl = "non_mix"`), 3rd other random slopes with non-mixture distributions U_{cont} `skurts` a list of same length and order as `skews` containing vectors of standardized kurtoses (kurtosis - 3) for the continuous variables, error terms, and any random effects with non-mixture distributions `fifths` a list of same length and order as `skews` containing vectors of standardized fifth cumulants for the continuous variables, error terms, and any random effects with non-mixture distributions; not necessary for `method = "Fleishman"` `sixths` a list of same length and order as `skews` containing vectors of standardized sixth cumulants for the continuous variables, error terms, and any random effects with non-mixture distributions; not necessary for `method = "Fleishman"` `Six` a list of length `M`, `M + 1`, or `2 * M`, where `Six[1:M]` are for X_{cont}, E (if `error_type = "non_mix"`) and `Six[M + 1]` or `Six[(M + 1):(2 * M)]` are for non-mixture U; if `error_type = "mix"` and there are only random effects (i.e., `length(corr.x) = 0`), use `Six[1:M] = rep(list(NULL), M)` so that `Six[M + 1]` or `Six[(M + 1):(2 * M)]` describes the non-mixture U; `Six[[p]][[j]]` is a vector of sixth cumulant correction values to aid in finding a valid PDF for X_{cont(pj)}, the j-th continuous non-mixture covariate for outcome Y_p; the last vector in `Six[[p]]` is for E_p (if `error_type = "non_mix"`); use `Six[[p]][[j]] = NULL` if no correction desired for X_{cont(pj)}; use `Six[[p]] = NULL` if no correction desired for any continuous non-mixture covariate or error term in equation p `Six[[M + p]][[j]]` is a vector of sixth cumulant correction values to aid in finding a valid PDF for U_{(pj)}, the j-th non-mixture random effect for outcome Y_p; use `Six[[M + p]][[j]] = NULL` if no correction desired for U_{(pj)}; use `Six[[M + p]] = NULL` if no correction desired for any continuous non-mixture random effect in equation p keep `Six = list()` if no corrections desired for all equations or if `method = "Fleishman"` `mix_pis` list of length `M`, `M + 1` or `2 * M`, where `mix_pis[1:M]` are for X_{cont}, E (if `error_type = "mix"`) and `mix_pis[M + 1]` or `mix_pis[(M + 1):(2 * M)]` are for mixture U; use `mix_pis[[p]] = NULL` if equation p has no continuous mixture terms if `error_type = "non_mix"` and there are only random effects (i.e., `length(corr.x) = 0`), use `mix_pis[1:M] = NULL` so that `mix_pis[M + 1]` or `mix_pis[(M + 1):(2 * M)]` describes the mixture U; `mix_pis[[p]][[j]]` is a vector of mixing probabilities of the component distributions for X_{mix(pj)}, the j-th mixture covariate for outcome Y_p; the last vector in `mix_pis[[p]]` is for E_p (if `error_type = "mix"`); components should be ordered as in `corr.x` `mix_pis[[M + p]][[j]]` is a vector of mixing probabilities of the component distributions for U_{(pj)}, the j-th random effect with a mixture distribution for outcome Y_p; order is 1st random intercept (if `rand.int = "mix"`), 2nd random time slope (if `rand.tsl = "mix"`), 3rd other random slopes with mixture distributions; components should be ordered as in `corr.u` `mix_mus` list of same length and order as `mix_pis`; `mix_mus[[p]][[j]]` is a vector of means of the component distributions for X_{mix(pj)}, the last vector in `mix_mus[[p]]` is for E_p (if `error_type = "mix"`) `mix_mus[[p]][[j]]` is a vector of means of the component distributions for U_{mix(pj)} `mix_sigmas` list of same length and order as `mix_pis`; `mix_sigmas[[p]][[j]]` is a vector of standard deviations of the component distributions for X_{mix(pj)}, the last vector in `mix_sigmas[[p]]` is for E_p (if `error_type = "mix"`) `mix_sigmas[[p]][[j]]` is a vector of standard deviations of the component distributions for U_{mix(pj)} `mix_skews` list of same length and order as `mix_pis`; `mix_skews[[p]][[j]]` is a vector of skew values of the component distributions for X_{mix(pj)}, the last vector in `mix_skews[[p]]` is for E_p (if `error_type = "mix"`) `mix_skews[[p]][[j]]` is a vector of skew values of the component distributions for U_{mix(pj)} `mix_skurts` list of same length and order as `mix_pis`; `mix_skurts[[p]][[j]]` is a vector of standardized kurtoses of the component distributions for X_{mix(pj)}, the last vector in `mix_skurts[[p]]` is for E_p (if `error_type = "mix"`) `mix_skurts[[p]][[j]]` is a vector of standardized kurtoses of the component distributions for U_{mix(pj)} `mix_fifths` list of same length and order as `mix_pis`; not necessary for `method = "Fleishman"`; `mix_fifths[[p]][[j]]` is a vector of standardized fifth cumulants of the component distributions for X_{mix(pj)}, the last vector in `mix_fifths[[p]]` is for E_p (if `error_type = "mix"`) `mix_fifths[[p]][[j]]` is a vector of standardized fifth cumulants of the component distributions for U_{mix(pj)} `mix_sixths` list of same length and order as `mix_pis`; not necessary for `method = "Fleishman"`; `mix_sixths[[p]][[j]]` is a vector of standardized sixth cumulants of the component distributions for X_{mix(pj)}, the last vector in `mix_sixths[[p]]` is for E_p (if `error_type = "mix"`) `mix_sixths[[p]][[j]]` is a vector of standardized sixth cumulants of the component distributions for U_{mix(pj)} `mix_Six` a list of same length and order as `mix_pis`; keep `mix_Six = list()` if no corrections desired for all equations or if `method = "Fleishman"` p-th component of `mix_Six[1:M]` is a list of length equal to the total number of component distributions for the X_{mix(p)} and E_p (if `error_type = "mix"`); `mix_Six[[p]][[j]]` is a vector of sixth cumulant corrections for the j-th component distribution (i.e., if there are 2 continuous mixture independent variables for Y_p, where X_{mix(p1)} has 2 components and X_{mix(p2)} has 3 components, then `length(mix_Six[[p]]) = 5` and `mix_Six[[p]][[3]]` would correspond to the 1st component of X_{mix(p2)}); use `mix_Six[[p]][[j]] = NULL` if no correction desired for that component; use `mix_Six[[p]] = NULL` if no correction desired for any component of X_{mix(p)} and E_p q-th component of `mix_Six[M + 1]` or `mix_Six[(M + 1):(2 * M)]` is a list of length equal to the total number of component distributions for the U_{mix(q)}; `mix_Six[[q]][[j]]` is a vector of sixth cumulant corrections for the j-th component distribution; use `mix_Six[[q]][[j]] = NULL` if no correction desired for that component; use `mix_Six[[q]] = NULL` if no correction desired for any component of U_{mix(q)} `marginal` a list of length `M`, with the p-th component a list of cumulative probabilities for the ordinal variables associated with outcome Y_p (use `marginal[[p]] = NULL` if outcome Y_p has no ordinal variables); `marginal[[p]][[j]]` is a vector of the cumulative probabilities defining the marginal distribution of X_{ord(pj)}, the j-th ordinal variable for outcome Y_p; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1); for binary variables, the probability is the probability of the 1st category, which has the smaller support value; `length(marginal[[p]])` can differ across outcomes; the order should be the same as in `corr.x` `support` a list of length `M`, with the p-th component a list of support values for the ordinal variables associated with outcome Y_p; use `support[[p]] = NULL` if outcome Y_p has no ordinal variables; `support[[p]][[j]]` is a vector of the support values defining the marginal distribution of X_{ord(pj)}, the j-th ordinal variable for outcome Y_p; if not provided, the default for r categories is 1, ..., r `lam` list of length `M`, p-th component a vector of lambda (means > 0) values for Poisson variables for outcome Y_p (see `stats::dpois`); order is 1st regular Poisson and 2nd zero-inflated Poisson; use `lam[[p]] = NULL` if outcome Y_p has no Poisson variables; `length(lam[[p]])` can differ across outcomes; the order should be the same as in `corr.x` `p_zip` a list of vectors of probabilities of structural zeros (not including zeros from the Poisson distribution) for the zero-inflated Poisson variables (see `VGAM::dzipois`); if `p_zip` = 0, Y_{pois} has a regular Poisson distribution; if `p_zip` is in (0, 1), Y_{pois} has a zero-inflated Poisson distribution; if `p_zip` is in `(-(exp(lam) - 1)^(-1), 0)`, Y_{pois} has a zero-deflated Poisson distribution and `p_zip` is not a probability; if `p_zip = -(exp(lam) - 1)^(-1)`, Y_{pois} has a positive-Poisson distribution (see `VGAM::dpospois`); order is 1st regular Poisson and 2nd zero-inflated Poisson; if a single number, all Poisson variables given this value; if a vector of length `M`, all Poisson variables in equation p given `p_zip[p]`; otherwise, missing values are set to 0 and ordered 1st `size` list of length `M`, p-th component a vector of size parameters for the Negative Binomial variables for outcome Y_p (see `stats::dnbinom`); order is 1st regular NB and 2nd zero-inflated NB; use `size[[p]] = NULL` if outcome Y_p has no Negative Binomial variables; `length(size[[p]])` can differ across outcomes; the order should be the same as in `corr.x` `prob` list of length `M`, p-th component a vector of success probabilities for the Negative Binomial variables for outcome Y_p (see `stats::dnbinom`); order is 1st regular NB and 2nd zero-inflated NB; use `prob[[p]] = NULL` if outcome Y_p has no Negative Binomial variables; `length(prob[[p]])` can differ across outcomes; the order should be the same as in `corr.x` `mu` list of length `M`, p-th component a vector of mean values for the Negative Binomial variables for outcome Y_p (see `stats::dnbinom`); order is 1st regular NB and 2nd zero-inflated NB; use `mu[[p]] = NULL` if outcome Y_p has no Negative Binomial variables; `length(mu[[p]])` can differ across outcomes; the order should be the same as in `corr.x`; for zero-inflated NB variables, this refers to the mean of the NB distribution (see `VGAM::dzinegbin`) (*Note: either `prob` or `mu` should be supplied for all Negative Binomial variables, not a mixture) `p_zinb` a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables (see `VGAM::dzinegbin`); if `p_zinb` = 0, Y_{nb} has a regular NB distribution; if `p_zinb` is in `(-prob^size/(1 - prob^size),` `0)`, Y_{nb} has a zero-deflated NB distribution and `p_zinb` is not a probability; if `p_zinb = -prob^size/(1 - prob^size)`, Y_{nb} has a positive-NB distribution (see `VGAM::dposnegbin`); order is 1st regular NB and 2nd zero-inflated NB; if a single number, all NB variables given this value; if a vector of length `M`, all NB variables in equation p given `p_zinb[p]`; otherwise, missing values are set to 0 and ordered 1st `corr.x` list of length `M`, each component a list of length `M`; `corr.x[[p]][[q]]` is matrix of correlations for independent variables in equations p (X_{(pj)} for outcome Y_p) and q (X_{(qj)} for outcome Y_q); order: 1st ordinal (same order as in `marginal`), 2nd continuous non-mixture (same order as in `skews`), 3rd components of continuous mixture (same order as in `mix_pis`), 4th regular Poisson, 5th zero-inflated Poisson (same order as in `lam`), 6th regular NB, and 7th zero-inflated NB (same order as in `size`); if p = q, `corr.x[[p]][[q]]` is a correlation matrix with `nrow(corr.x[[p]][[q]])` = # X_{(pj)} for outcome Y_p; if p != q, `corr.x[[p]][[q]]` is a non-symmetric matrix of correlations where rows correspond to covariates for Y_p so that `nrow(corr.x[[p]][[q]])` = # X_{(pj)} for outcome Y_p and columns correspond to covariates for Y_q so that `ncol(corr.x[[p]][[q]])` = # X_{(qj)} for outcome Y_q; use `corr.x[[p]][[q]] = NULL` if equation q has no X_{(qj)}; use `corr.x[[p]] = NULL` if equation p has no X_{(pj)} `corr.e` correlation matrix for continuous non-mixture or components of mixture error terms `same.var` either a vector or a matrix; if a vector, `same.var` includes column numbers of `corr.x[[1]][[1]]` corresponding to independent variables that should be identical across equations; these terms must have the same indices for all `p = 1, ..., M`; i.e., if the 1st ordinal variable represents sex which should be the same for each equation, then `same.var[1] = 1` since ordinal variables are 1st in `corr.x[[1]][[1]]` and sex is the 1st ordinal variable, and the 1st term for all other outcomes must also be sex; if a matrix, columns 1 and 2 are outcome p and column index in `corr.x[[p]][[p]]` for 1st instance of variable, columns 3 and 4 are outcome q and column index in `corr.x[[q]][[q]]` for subsequent instances of variable; i.e., if 1st term for all outcomes is sex and `M = 3`, then `same.var = matrix(c(1,` `1, 2, 1, 1, 1, 3, 1), 2, 4, byrow = TRUE)`; the independent variable index corresponds to ordinal, continuous non-mixture, component of continuous mixture, Poisson, or NB variable `subj.var` matrix where 1st column is outcome index (`p = 1, ..., M`), 2nd column is independent variable index corresponding to covariate which is a a subject-level term (not including time), including time-varying covariates; the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable; assumes all other variables are group-level terms; these subject-level terms are used to form interactions with the group level terms `int.var` matrix where 1st column is outcome index (`p = 1, ..., M`), 2nd and 3rd columns are indices corresponding to two group-level or two subject-level independent variables to form interactions between; this includes all interactions that are not accounted for by a subject-group level interaction (as indicated by `subj.var`) or by a time-covariate interaction (as indicated by `tint.var`); ex: 1, 2, 3 indicates that for outcome 1, the 2nd and 3rd independent variables form an interaction term; the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable `tint.var` matrix where 1st column is outcome index (`p = 1, ..., M`), 2nd column is index of independent variable to form interaction with time; if `tint.var = NULL` or no X_{(pj)} are indicated for outcome Y_p, all group-level variables (variables not indicated as subject-level variables in `subj.var`) will be crossed with time, else includes only terms indicated by 2nd column (i.e., in order to include subject-level variables); ex: 1, 1 indicates that for outcome 1, the 1st independent variable has an interaction with time; the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable `betas.0` vector of length `M` containing intercepts, if `NULL` all set equal to 0; if length 1, all intercepts set to `betas.0` `betas` list of length `M`, p-th component a vector of coefficients for outcome Y_p, including group and subject-level terms; order is order of variables in `corr.x[[p]][[p]]`; if `betas = list()`, all set to 0 so that all Y only have intercept and/or interaction terms plus error terms; if all outcomes have the same betas, use list of length 1; if Y_p only has intercept and/or interaction terms, set `betas[[p]] = NULL`; if there are continuous mixture variables, beta is for mixture variable (not for components) `betas.subj` list of length `M`, p-th component a vector of coefficients for interaction terms between group-level terms and subject-level terms given in `subj.var`; order is 1st by subject-level covariate as given in `subj.var` and 2nd by group-level covariate as given in `corr.x` or an interaction between group-level terms; if all outcomes have the same betas, use list of length 1; if Y_p only has group-level terms, set `betas.subj[[p]] = NULL`; since subject-subject interactions are treated as subject-level variables, these will also be crossed with all group-level variables and require coefficients; if there are continuous mixture variables, beta is for mixture variable (not for components) `betas.int` list of length `M`, p-th component a vector of coefficients for interaction terms indicated in `int.var`; order is the same order as given in `int.var`; if all outcomes have the same betas, use list of length 1; if Y_p has none, set `betas.int[[p]] = NULL`; if there are continuous mixture variables, beta is for mixture variable (not for components) `betas.t` vector of length `M` of coefficients for time terms, if `NULL` all set equal to 1; if length 1, all intercepts set to `betas.t` `betas.tint` list of length `M`, p-th component a vector of coefficients for all interactions with time; this includes interactions with group-level covariates or terms indicated in `tint.var`; order is the same order as given in `corr.x` or `tint.var`; if all outcomes have the same betas, use list of length 1; if Y_p has none, set `betas.tint[[p]] = NULL`; since group-group interactions are treated as group-level variables, these will also be crossed with time (unless otherwise specified for that outcome in `tint.var`) and require coefficients; if there are continuous mixture variables, beta is for mixture variable (not for components) `rand.int` "none" (default) if no random intercept term for all outcomes, "non_mix" if all random intercepts have a continuous non-mixture distribution, "mix" if all random intercepts have a continuous mixture distribution; also can be a vector of length `M` containing a combination (i.e., `c("non_mix", "mix", "none")` if the 1st has a non-mixture distribution, the 2nd has a mixture distribution, and 3rd outcome has no random intercept) `rand.tsl` "none" (default) if no random slope for time for all outcomes, "non_mix" if all random time slopes have a continuous non-mixture distribution, "mix" if all random time slopes have a continuous mixture distribution; also can be a vector of length `M` as in `rand.int` `rand.var` matrix where 1st column is outcome index (`p = 1, ..., M`), 2nd column is independent variable index corresponding to covariate to assign random effect to (not including the random intercept or time slope if present); the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable; order is 1st continuous non-mixture and 2nd continuous mixture random effects; note that the order of the rows corresponds to the order of the random effects in `corr.u` not the order of the independent variable so that a continuous mixture covariate with a non-mixture random effect would be ordered before a continuous non-mixture covariate with a mixture random effect (the 2nd column of `rand.var` indicates the specific covariate) `corr.u` if the random effects are the same variables across equations, a matrix of correlations for U; if the random effects are different variables across equations, a list of length `M`, each component a list of length `M`; `corr.u[[p]][[q]]` is matrix of correlations for random effects in equations p (U_{(pj)} for outcome Y_p) and q (U_{(qj)} for outcome Y_q); if p = q, `corr.u[[p]][[q]]` is a correlation matrix with `nrow(corr.u[[p]][[q]])` = # U_{(pj)} for outcome Y_p; if p != q, `corr.u[[p]][[q]]` is a non-symmetric matrix of correlations where rows correspond to U_{(pj)} for Y_p so that `nrow(corr.u[[p]][[q]])` = # U_{(pj)} for outcome Y_p and columns correspond to U_{(qj)} for Y_q so that `ncol(corr.u[[p]][[q]])` = # U_{(qj)} for outcome Y_q; the number of random effects for Y_p is taken from `nrow(corr.u[[p]][[1]])` so that if there should be random effects, there must be entries for `corr.u`; use `corr.u[[p]][[q]] = NULL` if equation q has no U_{(qj)}; use `corr.u[[p]] = NULL` if equation p has no U_{(pj)}; correlations are specified in terms of components of mixture variables (if present); order is 1st random intercept (if `rand.int != "none"`), 2nd random time slope (if `rand.tsl != "none"`), 3rd other random slopes with non-mixture distributions, 4th other random slopes with mixture distributions `seed` the seed value for random number generation (default = 1234) `use.nearPD` TRUE to convert the overall intermediate correlation matrix formed by the X (for all outcomes and independent variables), E, or the random effects to the nearest positive definite matrix with `Matrix::nearPD` if necessary; if FALSE and `adjgrad = FALSE` the negative eigenvalues are replaced with `eigmin` if necessary `eigmin` minimum replacement eigenvalue if overall intermediate correlation matrix is not positive-definite (default = 0) `adjgrad` TRUE to use `adj_grad` to convert overall intermediate correlation matrix to a positive-definite matrix and next 5 inputs can be used `B1` the initial matrix for algorithm; if NULL, uses a scaled initial matrix with diagonal elements `sqrt(nrow(Sigma))/2` `tau` parameter used to calculate theta (default = 0.5) `tol` maximum error for Frobenius norm distance between new matrix and original matrix (default = 0.1) `steps` maximum number of steps for k (default = 100) `msteps` maximum number of steps for m (default = 10) `nrand` the number of random numbers to generate in calculating intermediate correlations (default = 10000) `errorloop` if TRUE, uses `corr_error` to attempt to correct the correlation of the independent variables within and across outcomes to be within `epsilon` of the target correlations `corr.x` until the number of iterations reaches `maxit` (default = FALSE) `epsilon` the maximum acceptable error between the final and target correlation matrices (default = 0.001) in the calculation of ordinal intermediate correlations with `ord_norm` or in the error loop `maxit` the maximum number of iterations to use (default = 1000) in the calculation of ordinal intermediate correlations with `ord_norm` or in the error loop `quiet` if FALSE prints messages, if TRUE suppresses messages

## Value

A list with the following components:

`Y` matrix with `n` rows and `M` columns of outcomes

`X` list of length `M` containing X_{ord(pj)}, X_{cont(pj)}, X_{comp(pj)}, X_{pois(pj)}, X_{nb(pj)}

`X_all` list of length `M` containing X_{ord(pj)}, X_{cont(pj)}, X_{mix(pj)}, X_{pois(pj)}, X_{nb(pj)}, X interactions as indicated by `int.var`, subject-group level term interactions as indicated by `subj.var`, Time_p, and Time interactions as indicated by `tint.var`; order is 1st covariates X (as specified in `corr.x`), 2nd group-group or subject-subject interactions (ordered as in `int.var`), 3rd subject-group interactions (1st by subject-level variable as specified in `subj.var`, 2nd by covariate as specified in `corr.x`), and 4th time interactions (either as specified in `corr.x` with group-level covariates or in `tint.var`)

`E` matrix with `n` rows containing continuous non-mixture or components of continuous mixture error terms

`E_mix` matrix with `n` rows containing continuous mixture error terms

`Sigma_X0` matrix of intermediate correlations calculated by `intercorr`

`Sigma_X` matrix of intermediate correlations after `nearPD` or `adj_grad` function has been used; applied to generate the normal variables transformed to get the desired distributions

`Error_Time` the time in minutes required to use the error loop

`Time` the total simulation time in minutes

`niter` a matrix of the number of iterations used in the error loop

If continuous variables are produced: `constants` a list of maximum length `2 * M`, the 1st `M` components are data.frames of the constants for the X_{cont(pj)}, X_comp(pj) and E_p, the 2nd `M` components are for random effects (if present),

`SixCorr` a list of maximum length `2 * M`, the 1st `M` components are lists of sixth cumulant correction values used to obtain valid pdf's for the X_{cont(pj)}, X_comp(pj), and E_p, the 2nd `M` components are for random effects (if present),

`valid.pdf` a list of maximum length `2 * M` of vectors where the i-th element is "TRUE" if the constants for the i-th continuous variable generate a valid pdf, else "FALSE"; the 1st `M` components are for the X_{cont(pj)}, X_comp(pj), and E_p, the 2nd `M` components are for random effects (if present)

If random effects are produced: `U` a list of length `M` containing matrices of continuous non-mixture and components of mixture random effects,

`U_all` a list of length `M` containing matrices of continuous non-mixture and mixture random effects,

`V` a list of length `M` containing matrices of design matrices for random effects,

`rmeans2` and `rvars2` the means and variances of the non-mixture and components reordered in accordance with the random intercept and time slope types (input for `summary_sys`)

## Reasons for Function Errors

1) The most likely cause for function errors is that the parameter inputs are mispecified. Using `checkpar` prior to simulation can help decrease these errors.

2) Another reason for error is that no solutions to `fleish` or `poly` converged when using `find_constants`. If this happens, the simulation will stop. It may help to first use `find_constants` for each continuous variable to determine if a sixth cumulant correction value is needed. If the standardized cumulants are obtained from `calc_theory`, the user may need to use rounded values as inputs (i.e. `skews = round(skews, 8)`). For example, in order to ensure that skew is exactly 0 for symmetric distributions.

3) The kurtosis for a continuous variable may be outside the region of possible values. There is an associated lower kurtosis boundary for associated with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use `calc_lower_skurt` to determine the boundary for a given set of cumulants.

## References

Amatya A & Demirtas H (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 85(15):3129-39. doi: 10.1080/00949655.2014.953534.

Barbiero A & Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions. R package version 1.4.0.
https://CRAN.R-project.org/package=GenOrd

Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions. Computers & Mathematics with Applications, 15(10):819-28.

Demirtas H (2006). A method for multivariate ordinal data generation given marginal distributions and correlations. Journal of Statistical Computation and Simulation, 76(11):1017-1025.
doi: 10.1080/10629360600569246.

Demirtas H (2014). Joint Generation of Binary and Nonnormal Continuous Data. Biometrics & Biostatistics, S12.

Demirtas H, Hedeker D, & Mermelstein RJ (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27):3337-3346. doi: 10.1002/sim.5362.

Everitt BS (1996). An Introduction to Finite Mixture Distributions. Statistical Methods in Medical Research, 5(2):107-127. doi: 10.1177/096228029600500202.

Ferrari PA & Barbiero A (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4): 566-589. doi: 10.1080/00273171.2012.692630.

Fialkowski AC (2017). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.1. https://CRAN.R-project.org/package=SimMultiCorrData.

Fialkowski AC (2018). SimCorrMix: Simulation of Correlated Data of Multiple Variable Types including Continuous and Count Mixture Distributions. R package version 0.1.0. https://github.com/AFialkowski/SimCorrMix

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43:521-532. doi: 10.1007/BF02293811.

Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi: 10.1016/S0167-9473(02)00072-5. (ScienceDirect)

Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1):65-71. doi: 10.22237/jmasm/1083370080.

Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi: 10.1080/10629360600605065.

Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64:25-35. doi: 10.1007/BF02294317.

Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using Mathematica. Journal of Statistical Software, 19(3):1 - 17.
doi: 10.18637/jss.v019.i03.

Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22:329-343.

Ismail N & Zamani H (2013). Estimation of Claim Count Data Using Negative Binomial, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson Regression Models. Casualty Actuarial Society E-Forum, 41(20):1-28.

Kincaid C (2005). Guidelines for Selecting the Covariance Structure in Mixed Model Analysis. Computational Statistics and Data Analysis, 198(30):1-8.

Lambert D (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 34(1):1-14.

Lininger M, Spybrook J, & Cheatham CC (2015). Hierarchical Linear Model: Thinking Outside the Traditional Repeated-Measures Analysis-of-Variance Box. Journal of Athletic Training, 50(4):438-441. doi: 10.4085/1062-6050-49.5.09.

McCulloch CE, Searle SR, Neuhaus JM (2008). Generalized, Linear, and Mixed Models (2nd ed.). Wiley Series in Probability and Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.

Olsson U, Drasgow F, & Dorans NJ (1982). The Polyserial Correlation Coefficient. Psychometrika, 47(3):337-47. doi: 10.1007/BF02294164.

Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.

Schork NJ, Allison DB, & Thiel B (1996). Mixture Distributions in Human Genetics Research. Statistical Methods in Medical Research, 5:155-178. doi: 10.1177/096228029600500204.

Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48:465-471. doi: 10.1007/BF02293687.

Van Der Leeden R (1998). Multilevel Analysis of Repeated Measures Data. Quality & Quantity, 32(1):15-29.

Yahav I & Shmueli G (2012). On Generating Multivariate Poisson Data in Management Science Applications. Applied Stochastic Models in Business and Industry, 28(1):91-102. doi: 10.1002/asmb.901.

Yee TW (2017). VGAM: Vector Generalized Linear and Additive Models.
https://CRAN.R-project.org/package=VGAM.

Zhang X, Mallick H, & Yi N (2016). Zero-Inflated Negative Binomial Regression for Differential Abundance Testing in Microbiome Studies. Journal of Bioinformatics and Genomics 2(2):1-9. doi: 10.18454/jbg.2016.2.2.1.

`find_constants`, `intercorr`, `checkpar`, `summary_sys`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139``` ```M <- 3 B <- calc_theory("Beta", c(4, 1.5)) skews <- lapply(seq_len(M), function(x) B[3]) skurts <- lapply(seq_len(M), function(x) B[4]) fifths <- lapply(seq_len(M), function(x) B[5]) sixths <- lapply(seq_len(M), function(x) B[6]) Six <- lapply(seq_len(M), function(x) list(0.03)) corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M, byrow = TRUE) means <- lapply(seq_len(M), function(x) B[1]) vars <- lapply(seq_len(M), function(x) B[2]^2) marginal <- list(0.3, 0.4, 0.5) support <- lapply(seq_len(M), function(x) list(0:1)) corr.x <- list(list(matrix(1, 1, 1), matrix(0.4, 1, 1), matrix(0.4, 1, 1)), list(matrix(0.4, 1, 1), matrix(1, 1, 1), matrix(0.4, 1, 1)), list(matrix(0.4, 1, 1), matrix(0.4, 1, 1), matrix(1, 1, 1))) betas <- list(0.5) betas.t <- 1 betas.tint <- list(0.25) Sys1 <- corrsys(10000, M, Time = 1:M, "Polynomial", "non_mix", means, vars, skews, skurts, fifths, sixths, Six, marginal = marginal, support = support, corr.x = corr.x, corr.e = corr.e, betas = betas, betas.t = betas.t, betas.tint = betas.tint, quiet = TRUE) ## Not run: seed <- 276 n <- 10000 M <- 3 Time <- 1:M # Error terms have a beta(4, 1.5) distribution with an AR(1, p = 0.4) # correlation structure B <- calc_theory("Beta", c(4, 1.5)) skews <- lapply(seq_len(M), function(x) B[3]) skurts <- lapply(seq_len(M), function(x) B[4]) fifths <- lapply(seq_len(M), function(x) B[5]) sixths <- lapply(seq_len(M), function(x) B[6]) Six <- lapply(seq_len(M), function(x) list(0.03)) error_type <- "non_mix" corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M, byrow = TRUE) # 1 continuous mixture of Normal(-2, 1) and Normal(2, 1) for each Y mix_pis <- lapply(seq_len(M), function(x) list(c(0.4, 0.6))) mix_mus <- lapply(seq_len(M), function(x) list(c(-2, 2))) mix_sigmas <- lapply(seq_len(M), function(x) list(c(1, 1))) mix_skews <- lapply(seq_len(M), function(x) list(c(0, 0))) mix_skurts <- lapply(seq_len(M), function(x) list(c(0, 0))) mix_fifths <- lapply(seq_len(M), function(x) list(c(0, 0))) mix_sixths <- lapply(seq_len(M), function(x) list(c(0, 0))) mix_Six <- list() Nstcum <- calc_mixmoments(mix_pis[[1]][[1]], mix_mus[[1]][[1]], mix_sigmas[[1]][[1]], mix_skews[[1]][[1]], mix_skurts[[1]][[1]], mix_fifths[[1]][[1]], mix_sixths[[1]][[1]]) means <- lapply(seq_len(M), function(x) c(Nstcum[1], B[1])) vars <- lapply(seq_len(M), function(x) c(Nstcum[2]^2, B[2]^2)) # 1 binary variable for each Y marginal <- lapply(seq_len(M), function(x) list(0.4)) support <- list(NULL, list(c(0, 1)), NULL) # 1 Poisson variable for each Y lam <- list(1, 5, 10) # Y2 and Y3 are zero-inflated Poisson variables p_zip <- list(NULL, 0.05, 0.1) # 1 NB variable for each Y size <- list(10, 15, 20) prob <- list(0.3, 0.4, 0.5) # either prob or mu is required (not both) mu <- mapply(function(x, y) x * (1 - y)/y, size, prob, SIMPLIFY = FALSE) # Y2 and Y3 are zero-inflated NB variables p_zinb <- list(NULL, 0.05, 0.1) # The 2nd (the normal mixture) variable is the same across Y same.var <- 2 # Create the correlation matrix in terms of the components of the normal # mixture K <- 5 corr.x <- list() corr.x[[1]] <- list(matrix(0.1, K, K), matrix(0.2, K, K), matrix(0.3, K, K)) diag(corr.x[[1]][[1]]) <- 1 # set correlation between components to 0 corr.x[[1]][[1]][2:3, 2:3] <- diag(2) # set correlations with the same variable equal across outcomes corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <- corr.x[[1]][[1]][, same.var] corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(0.35, K, K), matrix(0.4, K, K)) diag(corr.x[[2]][[2]]) <- 1 corr.x[[2]][[2]][2:3, 2:3] <- diag(2) corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <- t(corr.x[[1]][[2]][same.var, ]) corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ] corr.x[[2]][[2]][same.var, ] <- t(corr.x[[2]][[2]][, same.var]) corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]), matrix(0.5, K, K)) diag(corr.x[[3]][[3]]) <- 1 corr.x[[3]][[3]][2:3, 2:3] <- diag(2) corr.x[[3]][[3]][, same.var] <- t(corr.x[[1]][[3]][same.var, ]) corr.x[[3]][[3]][same.var, ] <- t(corr.x[[3]][[3]][, same.var]) # The 2nd and 3rd variables of each Y are subject-level variables subj.var <- matrix(c(1, 2, 1, 3, 2, 2, 2, 3, 3, 2, 3, 3), 6, 2, byrow = TRUE) int.var <- tint.var <- NULL betas.0 <- 0 betas <- list(seq(0.5, 0.5 + (K - 2) * 0.25, 0.25)) betas.subj <- list(seq(0.5, 0.5 + (K - 2) * 0.1, 0.1)) betas.int <- list() betas.t <- 1 betas.tint <- list(c(0.25, 0.5)) method <- "Polynomial" # Check parameter inputs checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(), size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.yx = list(), corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint) # Simulated system using correlation method 1 N <- corrsys(n, M, Time, method, error_type, means, vars, skews, skurts, fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas, betas.subj, betas.int, betas.t, betas.tint, seed = seed, use.nearPD = FALSE) # Summarize the results S <- summary_sys(N\$Y, N\$E, E_mix = NULL, N\$X, N\$X_all, M, method, means, vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, support, lam, p_zip, size, prob, mu, p_zinb, corr.x, corr.e) ## End(Not run) ```