corrsys: Generate Correlated Systems of Equations with Ordinal,...

Description Usage Arguments Value Reasons for Function Errors References See Also Examples

Description

This function generates a correlated system of M equations representing a system of repeated measures at M time points. The equations may contain 1) ordinal (r ≥q 2 categories), continuous (normal, non-normal, and mixture distributions), count (regular and zero-inflated, Poisson and Negative Binomial) independent variables X; 2) continuous error terms E; 3) a discrete time variable Time; and 4) random effects U. The assumptions are that 1) there are at least 2 equations, 2) the independent variables, random effect terms, and error terms are uncorrelated, 3) each equation has an error term, 4) all error terms have a continuous non-mixture distribution or all have a continuous mixture distribution, 5) all random effects are continuous, and 6) growth is linear (with respect to time). The random effects may be a random intercept, a random slope for time, or a random slope for any of the X variables. Continuous variables are simulated using either Fleishman's third-order (method = "Fleishman", doi: 10.1007/BF02293811) or Headrick's fifth-order (method = "Polynomial", doi: 10.1016/S0167-9473(02)00072-5) power method transformation (PMT). Simulation occurs at the component-level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. The X terms can be the same across equations (i.e., modeling sex or height) or may be time-varying covariates. The equations may contain different numbers of X terms (i.e., a covariate could be missing for a given equation).

The outcomes Y are generated using a hierarchical linear models (HLM) approach, which allows the data to be structured in at least two levels. Level-1 is the repeated measure (time or condition) and other subject-level variables. Level-1 is nested within Level-2, which describes the average of the outcome (the intercept) and growth (slope for time) as a function of group-level variables. The first level captures the within-subject variation, while the second level describes the between-subjects variability. Using a HLM provides a way to determine if: a) subjects differ at a specific time point with respect to the dependent variable, b) growth rates differ across conditions, or c) growth rates differ across subjects. Random effects describe deviation at the subject-level from the average (fixed) effect described by the slope coefficients (betas). See the The Hierarchical Linear Models Approach for a System of Correlated Equations with Multiple Variable Types vignette for a description of the HLM model. The user can specify subject-level X terms, and each subject-level X term is crossed with all group-level X terms. The equations may also contain interactions between X variables. Interactions specified in int.var between two group-level covariates are themselves considered group-level covariates and will be crossed with subject-level covariates. Interactions between two subject-level covariates are considered subject-level covariates and will be crossed with group-level covariates. Since Time is a subject-level variable, each group-level term is crossed with Time unless otherwise specified.

Random effects may be added for the intercept, time slope, or effects of any of the covariates. The type of random intercept and time slope (i.e., non-mixture or mixture) is specified in rand.int and rand.tsl. This type may vary by equation. The random effects for independent variables are specified in rand.var and may also contain a combination of non-mixture and mixture continuous distributions. If the parameter lists are of length M + 1, the random effects are the same variables across equations and the correlation for the effects corr.u is a matrix. If the parameter lists are of length 2 * M, the random effects are different variables across equations and the correlation for the effects corr.u is a list.

The independent variables, interactions, Time effect, random effects, and error terms are summed together to produce the outcomes Y. The beta coefficients may be the same or differ across equations. The user specifies the betas for the independent variables in betas, for the interactions between two group-level or two subject-level covariates in betas.int, for the group-subject level interactions in betas.subj, and for the Time interactions in betas.tint. Setting a coefficient to 0 will eliminate that term. The user also provides the correlations 1) between E terms; 2) between X variables within each outcome Y_p, p = 1, ..., M, and between outcome pairs; and 3) between U variables within each outcome Y_p, p = 1, ..., M, and between outcome pairs. The order of the independent variables in corr.x must be 1st ordinal (same order as in marginal), 2nd continuous non-mixture (same order as in skews), 3rd components of continuous mixture (same order as in mix_pis), 4th regular Poisson, 5th zero-inflated Poisson (same order as in lam), 6th regular NB, and 7th zero-inflated NB (same order as in size). The order of the random effects in corr.u must be 1st random intercept, 2nd random time slope, 3rd continuous non-mixture random effects, and 4th components of continuous mixture random effects.

The variables are generated from multivariate normal variables with intermediate correlations calculated using intercorr, which employs correlation method 1. See SimCorrMix for a description of the correlation method and the techniques used to generate each variable type. The order of the variables returned is 1st covariates X (as specified in corr.x), 2nd group-group or subject-subject interactions (ordered as in int.var), 3rd subject-group interactions (1st by subject-level variable as specified in subj.var, 2nd by covariate as specified in corr.x), and 4th time interactions (either as specified in corr.x with group-level covariates or in tint.var).

This function contains no parameter checks in order to decrease simulation time. That should be done first using checkpar. Summaries of the system can be obtained using summary_sys. The Correlated Systems of Statistical Equations with Multiple Variable Types demonstrates examples.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
corrsys(n = 10000, M = NULL, Time = NULL, method = c("Fleishman",
  "Polynomial"), error_type = c("non_mix", "mix"), means = list(),
  vars = list(), skews = list(), skurts = list(), fifths = list(),
  sixths = list(), Six = list(), mix_pis = list(), mix_mus = list(),
  mix_sigmas = list(), mix_skews = list(), mix_skurts = list(),
  mix_fifths = list(), mix_sixths = list(), mix_Six = list(),
  marginal = list(), support = list(), lam = list(), p_zip = list(),
  size = list(), prob = list(), mu = list(), p_zinb = list(),
  corr.x = list(), corr.e = NULL, same.var = NULL, subj.var = NULL,
  int.var = NULL, tint.var = NULL, betas.0 = NULL, betas = list(),
  betas.subj = list(), betas.int = list(), betas.t = NULL,
  betas.tint = list(), rand.int = c("none", "non_mix", "mix"),
  rand.tsl = c("none", "non_mix", "mix"), rand.var = NULL,
  corr.u = list(), seed = 1234, use.nearPD = TRUE, eigmin = 0,
  adjgrad = FALSE, B1 = NULL, tau = 0.5, tol = 0.1, steps = 100,
  msteps = 10, nrand = 1e+05, errorloop = FALSE, epsilon = 0.001,
  maxit = 1000, quiet = FALSE)

Arguments

n

the sample size (i.e. the length of each simulated variable; default = 10000)

M

the number of dependent variables Y (outcomes); equivalently, the number of equations in the system

Time

a vector of values to use for time; each subject receives the same time value; if NULL, Time = 1:M

method

the PMT method used to generate all continuous variables, including independent variables (covariates), error terms, and random effects; "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation

error_type

"non_mix" if all error terms have continuous non-mixture distributions, "mix" if all error terms have continuous mixture distributions

means

if no random effects, a list of length M where means[[p]] contains a vector of means for the continuous independent variables in equation p with non-mixture (X_{cont}) or mixture (X_{mix}) distributions and for the error terms (E); order in vector is X_{cont}, X_{mix}, E

if there are random effects, a list of length M + 1 if the effects are the same across equations or 2 * M if they differ; where means[M + 1] or means[(M + 1):(2 * M)] are vectors of means for all random effects with continuous non-mixture or mixture distributions; order in vector is 1st random intercept U_0 (if rand.int != "none"), 2nd random time slope U_1 (if rand.tsl != "none"), 3rd other random slopes with non-mixture distributions U_{cont}, 4th other random slopes with mixture distributions U_{mix}

vars

a list of same length and order as means containing vectors of variances for the continuous variables, error terms, and any random effects

skews

if no random effects, a list of length M where skews[[p]] contains a vector of skew values for the continuous independent variables in equation p with non-mixture (X_{cont}) distributions and for E if error_type = "non_mix"; order in vector is X_{cont}, E

if there are random effects, a list of length M + 1 if the effects are the same across equations or 2 * M if they differ; where skews[M + 1] or skews[(M + 1):(2 * M)] are vectors of skew values for all random effects with continuous non-mixture distributions; order in vector is 1st random intercept U_0 (if rand.int = "non_mix"), 2nd random time slope U_1 (if rand.tsl = "non_mix"), 3rd other random slopes with non-mixture distributions U_{cont}

skurts

a list of same length and order as skews containing vectors of standardized kurtoses (kurtosis - 3) for the continuous variables, error terms, and any random effects with non-mixture distributions

fifths

a list of same length and order as skews containing vectors of standardized fifth cumulants for the continuous variables, error terms, and any random effects with non-mixture distributions; not necessary for method = "Fleishman"

sixths

a list of same length and order as skews containing vectors of standardized sixth cumulants for the continuous variables, error terms, and any random effects with non-mixture distributions; not necessary for method = "Fleishman"

Six

a list of length M, M + 1, or 2 * M, where Six[1:M] are for X_{cont}, E (if error_type = "non_mix") and Six[M + 1] or Six[(M + 1):(2 * M)] are for non-mixture U; if error_type = "mix" and there are only random effects (i.e., length(corr.x) = 0), use Six[1:M] = rep(list(NULL), M) so that Six[M + 1] or Six[(M + 1):(2 * M)] describes the non-mixture U;

Six[[p]][[j]] is a vector of sixth cumulant correction values to aid in finding a valid PDF for X_{cont(pj)}, the j-th continuous non-mixture covariate for outcome Y_p; the last vector in Six[[p]] is for E_p (if error_type = "non_mix"); use Six[[p]][[j]] = NULL if no correction desired for X_{cont(pj)}; use Six[[p]] = NULL if no correction desired for any continuous non-mixture covariate or error term in equation p

Six[[M + p]][[j]] is a vector of sixth cumulant correction values to aid in finding a valid PDF for U_{(pj)}, the j-th non-mixture random effect for outcome Y_p; use Six[[M + p]][[j]] = NULL if no correction desired for U_{(pj)}; use Six[[M + p]] = NULL if no correction desired for any continuous non-mixture random effect in equation p

keep Six = list() if no corrections desired for all equations or if method = "Fleishman"

mix_pis

list of length M, M + 1 or 2 * M, where mix_pis[1:M] are for X_{cont}, E (if error_type = "mix") and mix_pis[M + 1] or mix_pis[(M + 1):(2 * M)] are for mixture U; use mix_pis[[p]] = NULL if equation p has no continuous mixture terms if error_type = "non_mix" and there are only random effects (i.e., length(corr.x) = 0), use mix_pis[1:M] = NULL so that mix_pis[M + 1] or mix_pis[(M + 1):(2 * M)] describes the mixture U;

mix_pis[[p]][[j]] is a vector of mixing probabilities of the component distributions for X_{mix(pj)}, the j-th mixture covariate for outcome Y_p; the last vector in mix_pis[[p]] is for E_p (if error_type = "mix"); components should be ordered as in corr.x

mix_pis[[M + p]][[j]] is a vector of mixing probabilities of the component distributions for U_{(pj)}, the j-th random effect with a mixture distribution for outcome Y_p; order is 1st random intercept (if rand.int = "mix"), 2nd random time slope (if rand.tsl = "mix"), 3rd other random slopes with mixture distributions; components should be ordered as in corr.u

mix_mus

list of same length and order as mix_pis;

mix_mus[[p]][[j]] is a vector of means of the component distributions for X_{mix(pj)}, the last vector in mix_mus[[p]] is for E_p (if error_type = "mix")

mix_mus[[p]][[j]] is a vector of means of the component distributions for U_{mix(pj)}

mix_sigmas

list of same length and order as mix_pis;

mix_sigmas[[p]][[j]] is a vector of standard deviations of the component distributions for X_{mix(pj)}, the last vector in mix_sigmas[[p]] is for E_p (if error_type = "mix")

mix_sigmas[[p]][[j]] is a vector of standard deviations of the component distributions for U_{mix(pj)}

mix_skews

list of same length and order as mix_pis;

mix_skews[[p]][[j]] is a vector of skew values of the component distributions for X_{mix(pj)}, the last vector in mix_skews[[p]] is for E_p (if error_type = "mix")

mix_skews[[p]][[j]] is a vector of skew values of the component distributions for U_{mix(pj)}

mix_skurts

list of same length and order as mix_pis;

mix_skurts[[p]][[j]] is a vector of standardized kurtoses of the component distributions for X_{mix(pj)}, the last vector in mix_skurts[[p]] is for E_p (if error_type = "mix")

mix_skurts[[p]][[j]] is a vector of standardized kurtoses of the component distributions for U_{mix(pj)}

mix_fifths

list of same length and order as mix_pis; not necessary for method = "Fleishman";

mix_fifths[[p]][[j]] is a vector of standardized fifth cumulants of the component distributions for X_{mix(pj)}, the last vector in mix_fifths[[p]] is for E_p (if error_type = "mix")

mix_fifths[[p]][[j]] is a vector of standardized fifth cumulants of the component distributions for U_{mix(pj)}

mix_sixths

list of same length and order as mix_pis; not necessary for method = "Fleishman";

mix_sixths[[p]][[j]] is a vector of standardized sixth cumulants of the component distributions for X_{mix(pj)}, the last vector in mix_sixths[[p]] is for E_p (if error_type = "mix")

mix_sixths[[p]][[j]] is a vector of standardized sixth cumulants of the component distributions for U_{mix(pj)}

mix_Six

a list of same length and order as mix_pis; keep mix_Six = list() if no corrections desired for all equations or if method = "Fleishman"

p-th component of mix_Six[1:M] is a list of length equal to the total number of component distributions for the X_{mix(p)} and E_p (if error_type = "mix"); mix_Six[[p]][[j]] is a vector of sixth cumulant corrections for the j-th component distribution (i.e., if there are 2 continuous mixture independent variables for Y_p, where X_{mix(p1)} has 2 components and X_{mix(p2)} has 3 components, then length(mix_Six[[p]]) = 5 and mix_Six[[p]][[3]] would correspond to the 1st component of X_{mix(p2)}); use mix_Six[[p]][[j]] = NULL if no correction desired for that component; use mix_Six[[p]] = NULL if no correction desired for any component of X_{mix(p)} and E_p

q-th component of mix_Six[M + 1] or mix_Six[(M + 1):(2 * M)] is a list of length equal to the total number of component distributions for the U_{mix(q)}; mix_Six[[q]][[j]] is a vector of sixth cumulant corrections for the j-th component distribution; use mix_Six[[q]][[j]] = NULL if no correction desired for that component; use mix_Six[[q]] = NULL if no correction desired for any component of U_{mix(q)}

marginal

a list of length M, with the p-th component a list of cumulative probabilities for the ordinal variables associated with outcome Y_p (use marginal[[p]] = NULL if outcome Y_p has no ordinal variables); marginal[[p]][[j]] is a vector of the cumulative probabilities defining the marginal distribution of X_{ord(pj)}, the j-th ordinal variable for outcome Y_p; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1); for binary variables, the probability is the probability of the 1st category, which has the smaller support value; length(marginal[[p]]) can differ across outcomes; the order should be the same as in corr.x

support

a list of length M, with the p-th component a list of support values for the ordinal variables associated with outcome Y_p; use support[[p]] = NULL if outcome Y_p has no ordinal variables; support[[p]][[j]] is a vector of the support values defining the marginal distribution of X_{ord(pj)}, the j-th ordinal variable for outcome Y_p; if not provided, the default for r categories is 1, ..., r

lam

list of length M, p-th component a vector of lambda (means > 0) values for Poisson variables for outcome Y_p (see stats::dpois); order is 1st regular Poisson and 2nd zero-inflated Poisson; use lam[[p]] = NULL if outcome Y_p has no Poisson variables; length(lam[[p]]) can differ across outcomes; the order should be the same as in corr.x

p_zip

a list of vectors of probabilities of structural zeros (not including zeros from the Poisson distribution) for the zero-inflated Poisson variables (see VGAM::dzipois); if p_zip = 0, Y_{pois} has a regular Poisson distribution; if p_zip is in (0, 1), Y_{pois} has a zero-inflated Poisson distribution; if p_zip is in (-(exp(lam) - 1)^(-1), 0), Y_{pois} has a zero-deflated Poisson distribution and p_zip is not a probability; if p_zip = -(exp(lam) - 1)^(-1), Y_{pois} has a positive-Poisson distribution (see VGAM::dpospois); order is 1st regular Poisson and 2nd zero-inflated Poisson; if a single number, all Poisson variables given this value; if a vector of length M, all Poisson variables in equation p given p_zip[p]; otherwise, missing values are set to 0 and ordered 1st

size

list of length M, p-th component a vector of size parameters for the Negative Binomial variables for outcome Y_p (see stats::dnbinom); order is 1st regular NB and 2nd zero-inflated NB; use size[[p]] = NULL if outcome Y_p has no Negative Binomial variables; length(size[[p]]) can differ across outcomes; the order should be the same as in corr.x

prob

list of length M, p-th component a vector of success probabilities for the Negative Binomial variables for outcome Y_p (see stats::dnbinom); order is 1st regular NB and 2nd zero-inflated NB; use prob[[p]] = NULL if outcome Y_p has no Negative Binomial variables; length(prob[[p]]) can differ across outcomes; the order should be the same as in corr.x

mu

list of length M, p-th component a vector of mean values for the Negative Binomial variables for outcome Y_p (see stats::dnbinom); order is 1st regular NB and 2nd zero-inflated NB; use mu[[p]] = NULL if outcome Y_p has no Negative Binomial variables; length(mu[[p]]) can differ across outcomes; the order should be the same as in corr.x; for zero-inflated NB variables, this refers to the mean of the NB distribution (see VGAM::dzinegbin) (*Note: either prob or mu should be supplied for all Negative Binomial variables, not a mixture)

p_zinb

a vector of probabilities of structural zeros (not including zeros from the NB distribution) for the zero-inflated NB variables (see VGAM::dzinegbin); if p_zinb = 0, Y_{nb} has a regular NB distribution; if p_zinb is in (-prob^size/(1 - prob^size), 0), Y_{nb} has a zero-deflated NB distribution and p_zinb is not a probability; if p_zinb = -prob^size/(1 - prob^size), Y_{nb} has a positive-NB distribution (see VGAM::dposnegbin); order is 1st regular NB and 2nd zero-inflated NB; if a single number, all NB variables given this value; if a vector of length M, all NB variables in equation p given p_zinb[p]; otherwise, missing values are set to 0 and ordered 1st

corr.x

list of length M, each component a list of length M; corr.x[[p]][[q]] is matrix of correlations for independent variables in equations p (X_{(pj)} for outcome Y_p) and q (X_{(qj)} for outcome Y_q); order: 1st ordinal (same order as in marginal), 2nd continuous non-mixture (same order as in skews), 3rd components of continuous mixture (same order as in mix_pis), 4th regular Poisson, 5th zero-inflated Poisson (same order as in lam), 6th regular NB, and 7th zero-inflated NB (same order as in size); if p = q, corr.x[[p]][[q]] is a correlation matrix with nrow(corr.x[[p]][[q]]) = # X_{(pj)} for outcome Y_p; if p != q, corr.x[[p]][[q]] is a non-symmetric matrix of correlations where rows correspond to covariates for Y_p so that nrow(corr.x[[p]][[q]]) = # X_{(pj)} for outcome Y_p and columns correspond to covariates for Y_q so that ncol(corr.x[[p]][[q]]) = # X_{(qj)} for outcome Y_q; use corr.x[[p]][[q]] = NULL if equation q has no X_{(qj)}; use corr.x[[p]] = NULL if equation p has no X_{(pj)}

corr.e

correlation matrix for continuous non-mixture or components of mixture error terms

same.var

either a vector or a matrix; if a vector, same.var includes column numbers of corr.x[[1]][[1]] corresponding to independent variables that should be identical across equations; these terms must have the same indices for all p = 1, ..., M; i.e., if the 1st ordinal variable represents sex which should be the same for each equation, then same.var[1] = 1 since ordinal variables are 1st in corr.x[[1]][[1]] and sex is the 1st ordinal variable, and the 1st term for all other outcomes must also be sex; if a matrix, columns 1 and 2 are outcome p and column index in corr.x[[p]][[p]] for 1st instance of variable, columns 3 and 4 are outcome q and column index in corr.x[[q]][[q]] for subsequent instances of variable; i.e., if 1st term for all outcomes is sex and M = 3, then same.var = matrix(c(1, 1, 2, 1, 1, 1, 3, 1), 2, 4, byrow = TRUE); the independent variable index corresponds to ordinal, continuous non-mixture, component of continuous mixture, Poisson, or NB variable

subj.var

matrix where 1st column is outcome index (p = 1, ..., M), 2nd column is independent variable index corresponding to covariate which is a a subject-level term (not including time), including time-varying covariates; the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable; assumes all other variables are group-level terms; these subject-level terms are used to form interactions with the group level terms

int.var

matrix where 1st column is outcome index (p = 1, ..., M), 2nd and 3rd columns are indices corresponding to two group-level or two subject-level independent variables to form interactions between; this includes all interactions that are not accounted for by a subject-group level interaction (as indicated by subj.var) or by a time-covariate interaction (as indicated by tint.var); ex: 1, 2, 3 indicates that for outcome 1, the 2nd and 3rd independent variables form an interaction term; the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable

tint.var

matrix where 1st column is outcome index (p = 1, ..., M), 2nd column is index of independent variable to form interaction with time; if tint.var = NULL or no X_{(pj)} are indicated for outcome Y_p, all group-level variables (variables not indicated as subject-level variables in subj.var) will be crossed with time, else includes only terms indicated by 2nd column (i.e., in order to include subject-level variables); ex: 1, 1 indicates that for outcome 1, the 1st independent variable has an interaction with time; the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable

betas.0

vector of length M containing intercepts, if NULL all set equal to 0; if length 1, all intercepts set to betas.0

betas

list of length M, p-th component a vector of coefficients for outcome Y_p, including group and subject-level terms; order is order of variables in corr.x[[p]][[p]]; if betas = list(), all set to 0 so that all Y only have intercept and/or interaction terms plus error terms; if all outcomes have the same betas, use list of length 1; if Y_p only has intercept and/or interaction terms, set betas[[p]] = NULL; if there are continuous mixture variables, beta is for mixture variable (not for components)

betas.subj

list of length M, p-th component a vector of coefficients for interaction terms between group-level terms and subject-level terms given in subj.var; order is 1st by subject-level covariate as given in subj.var and 2nd by group-level covariate as given in corr.x or an interaction between group-level terms; if all outcomes have the same betas, use list of length 1; if Y_p only has group-level terms, set betas.subj[[p]] = NULL; since subject-subject interactions are treated as subject-level variables, these will also be crossed with all group-level variables and require coefficients; if there are continuous mixture variables, beta is for mixture variable (not for components)

betas.int

list of length M, p-th component a vector of coefficients for interaction terms indicated in int.var; order is the same order as given in int.var; if all outcomes have the same betas, use list of length 1; if Y_p has none, set betas.int[[p]] = NULL; if there are continuous mixture variables, beta is for mixture variable (not for components)

betas.t

vector of length M of coefficients for time terms, if NULL all set equal to 1; if length 1, all intercepts set to betas.t

betas.tint

list of length M, p-th component a vector of coefficients for all interactions with time; this includes interactions with group-level covariates or terms indicated in tint.var; order is the same order as given in corr.x or tint.var; if all outcomes have the same betas, use list of length 1; if Y_p has none, set betas.tint[[p]] = NULL; since group-group interactions are treated as group-level variables, these will also be crossed with time (unless otherwise specified for that outcome in tint.var) and require coefficients; if there are continuous mixture variables, beta is for mixture variable (not for components)

rand.int

"none" (default) if no random intercept term for all outcomes, "non_mix" if all random intercepts have a continuous non-mixture distribution, "mix" if all random intercepts have a continuous mixture distribution; also can be a vector of length M containing a combination (i.e., c("non_mix", "mix", "none") if the 1st has a non-mixture distribution, the 2nd has a mixture distribution, and 3rd outcome has no random intercept)

rand.tsl

"none" (default) if no random slope for time for all outcomes, "non_mix" if all random time slopes have a continuous non-mixture distribution, "mix" if all random time slopes have a continuous mixture distribution; also can be a vector of length M as in rand.int

rand.var

matrix where 1st column is outcome index (p = 1, ..., M), 2nd column is independent variable index corresponding to covariate to assign random effect to (not including the random intercept or time slope if present); the independent variable index corresponds to ordinal, continuous non-mixture, continuous mixture (not mixture component), Poisson, or NB variable; order is 1st continuous non-mixture and 2nd continuous mixture random effects; note that the order of the rows corresponds to the order of the random effects in corr.u not the order of the independent variable so that a continuous mixture covariate with a non-mixture random effect would be ordered before a continuous non-mixture covariate with a mixture random effect (the 2nd column of rand.var indicates the specific covariate)

corr.u

if the random effects are the same variables across equations, a matrix of correlations for U; if the random effects are different variables across equations, a list of length M, each component a list of length M; corr.u[[p]][[q]] is matrix of correlations for random effects in equations p (U_{(pj)} for outcome Y_p) and q (U_{(qj)} for outcome Y_q); if p = q, corr.u[[p]][[q]] is a correlation matrix with nrow(corr.u[[p]][[q]]) = # U_{(pj)} for outcome Y_p; if p != q, corr.u[[p]][[q]] is a non-symmetric matrix of correlations where rows correspond to U_{(pj)} for Y_p so that nrow(corr.u[[p]][[q]]) = # U_{(pj)} for outcome Y_p and columns correspond to U_{(qj)} for Y_q so that ncol(corr.u[[p]][[q]]) = # U_{(qj)} for outcome Y_q; the number of random effects for Y_p is taken from nrow(corr.u[[p]][[1]]) so that if there should be random effects, there must be entries for corr.u; use corr.u[[p]][[q]] = NULL if equation q has no U_{(qj)}; use corr.u[[p]] = NULL if equation p has no U_{(pj)};

correlations are specified in terms of components of mixture variables (if present); order is 1st random intercept (if rand.int != "none"), 2nd random time slope (if rand.tsl != "none"), 3rd other random slopes with non-mixture distributions, 4th other random slopes with mixture distributions

seed

the seed value for random number generation (default = 1234)

use.nearPD

TRUE to convert the overall intermediate correlation matrix formed by the X (for all outcomes and independent variables), E, or the random effects to the nearest positive definite matrix with Matrix::nearPD if necessary; if FALSE and adjgrad = FALSE the negative eigenvalues are replaced with eigmin if necessary

eigmin

minimum replacement eigenvalue if overall intermediate correlation matrix is not positive-definite (default = 0)

adjgrad

TRUE to use adj_grad to convert overall intermediate correlation matrix to a positive-definite matrix and next 5 inputs can be used

B1

the initial matrix for algorithm; if NULL, uses a scaled initial matrix with diagonal elements sqrt(nrow(Sigma))/2

tau

parameter used to calculate theta (default = 0.5)

tol

maximum error for Frobenius norm distance between new matrix and original matrix (default = 0.1)

steps

maximum number of steps for k (default = 100)

msteps

maximum number of steps for m (default = 10)

nrand

the number of random numbers to generate in calculating intermediate correlations (default = 10000)

errorloop

if TRUE, uses corr_error to attempt to correct the correlation of the independent variables within and across outcomes to be within epsilon of the target correlations corr.x until the number of iterations reaches maxit (default = FALSE)

epsilon

the maximum acceptable error between the final and target correlation matrices (default = 0.001) in the calculation of ordinal intermediate correlations with ord_norm or in the error loop

maxit

the maximum number of iterations to use (default = 1000) in the calculation of ordinal intermediate correlations with ord_norm or in the error loop

quiet

if FALSE prints messages, if TRUE suppresses messages

Value

A list with the following components:

Y matrix with n rows and M columns of outcomes

X list of length M containing X_{ord(pj)}, X_{cont(pj)}, X_{comp(pj)}, X_{pois(pj)}, X_{nb(pj)}

X_all list of length M containing X_{ord(pj)}, X_{cont(pj)}, X_{mix(pj)}, X_{pois(pj)}, X_{nb(pj)}, X interactions as indicated by int.var, subject-group level term interactions as indicated by subj.var, Time_p, and Time interactions as indicated by tint.var; order is 1st covariates X (as specified in corr.x), 2nd group-group or subject-subject interactions (ordered as in int.var), 3rd subject-group interactions (1st by subject-level variable as specified in subj.var, 2nd by covariate as specified in corr.x), and 4th time interactions (either as specified in corr.x with group-level covariates or in tint.var)

E matrix with n rows containing continuous non-mixture or components of continuous mixture error terms

E_mix matrix with n rows containing continuous mixture error terms

Sigma_X0 matrix of intermediate correlations calculated by intercorr

Sigma_X matrix of intermediate correlations after nearPD or adj_grad function has been used; applied to generate the normal variables transformed to get the desired distributions

Error_Time the time in minutes required to use the error loop

Time the total simulation time in minutes

niter a matrix of the number of iterations used in the error loop

If continuous variables are produced: constants a list of maximum length 2 * M, the 1st M components are data.frames of the constants for the X_{cont(pj)}, X_comp(pj) and E_p, the 2nd M components are for random effects (if present),

SixCorr a list of maximum length 2 * M, the 1st M components are lists of sixth cumulant correction values used to obtain valid pdf's for the X_{cont(pj)}, X_comp(pj), and E_p, the 2nd M components are for random effects (if present),

valid.pdf a list of maximum length 2 * M of vectors where the i-th element is "TRUE" if the constants for the i-th continuous variable generate a valid pdf, else "FALSE"; the 1st M components are for the X_{cont(pj)}, X_comp(pj), and E_p, the 2nd M components are for random effects (if present)

If random effects are produced: U a list of length M containing matrices of continuous non-mixture and components of mixture random effects,

U_all a list of length M containing matrices of continuous non-mixture and mixture random effects,

V a list of length M containing matrices of design matrices for random effects,

rmeans2 and rvars2 the means and variances of the non-mixture and components reordered in accordance with the random intercept and time slope types (input for summary_sys)

Reasons for Function Errors

1) The most likely cause for function errors is that the parameter inputs are mispecified. Using checkpar prior to simulation can help decrease these errors.

2) Another reason for error is that no solutions to fleish or poly converged when using find_constants. If this happens, the simulation will stop. It may help to first use find_constants for each continuous variable to determine if a sixth cumulant correction value is needed. If the standardized cumulants are obtained from calc_theory, the user may need to use rounded values as inputs (i.e. skews = round(skews, 8)). For example, in order to ensure that skew is exactly 0 for symmetric distributions.

3) The kurtosis for a continuous variable may be outside the region of possible values. There is an associated lower kurtosis boundary for associated with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use calc_lower_skurt to determine the boundary for a given set of cumulants.

References

Amatya A & Demirtas H (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. Journal of Statistical Computation and Simulation, 85(15):3129-39. doi: 10.1080/00949655.2014.953534.

Barbiero A & Ferrari PA (2015). GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions. R package version 1.4.0.
https://CRAN.R-project.org/package=GenOrd

Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions. Computers & Mathematics with Applications, 15(10):819-28.

Demirtas H (2006). A method for multivariate ordinal data generation given marginal distributions and correlations. Journal of Statistical Computation and Simulation, 76(11):1017-1025.
doi: 10.1080/10629360600569246.

Demirtas H (2014). Joint Generation of Binary and Nonnormal Continuous Data. Biometrics & Biostatistics, S12.

Demirtas H, Hedeker D, & Mermelstein RJ (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27):3337-3346. doi: 10.1002/sim.5362.

Everitt BS (1996). An Introduction to Finite Mixture Distributions. Statistical Methods in Medical Research, 5(2):107-127. doi: 10.1177/096228029600500202.

Ferrari PA & Barbiero A (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4): 566-589. doi: 10.1080/00273171.2012.692630.

Fialkowski AC (2017). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.1. https://CRAN.R-project.org/package=SimMultiCorrData.

Fialkowski AC (2018). SimCorrMix: Simulation of Correlated Data of Multiple Variable Types including Continuous and Count Mixture Distributions. R package version 0.1.0. https://github.com/AFialkowski/SimCorrMix

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43:521-532. doi: 10.1007/BF02293811.

Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi: 10.1016/S0167-9473(02)00072-5. (ScienceDirect)

Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1):65-71. doi: 10.22237/jmasm/1083370080.

Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi: 10.1080/10629360600605065.

Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64:25-35. doi: 10.1007/BF02294317.

Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using Mathematica. Journal of Statistical Software, 19(3):1 - 17.
doi: 10.18637/jss.v019.i03.

Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22:329-343.

Ismail N & Zamani H (2013). Estimation of Claim Count Data Using Negative Binomial, Generalized Poisson, Zero-Inflated Negative Binomial and Zero-Inflated Generalized Poisson Regression Models. Casualty Actuarial Society E-Forum, 41(20):1-28.

Kincaid C (2005). Guidelines for Selecting the Covariance Structure in Mixed Model Analysis. Computational Statistics and Data Analysis, 198(30):1-8.

Lambert D (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 34(1):1-14.

Lininger M, Spybrook J, & Cheatham CC (2015). Hierarchical Linear Model: Thinking Outside the Traditional Repeated-Measures Analysis-of-Variance Box. Journal of Athletic Training, 50(4):438-441. doi: 10.4085/1062-6050-49.5.09.

McCulloch CE, Searle SR, Neuhaus JM (2008). Generalized, Linear, and Mixed Models (2nd ed.). Wiley Series in Probability and Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.

Olsson U, Drasgow F, & Dorans NJ (1982). The Polyserial Correlation Coefficient. Psychometrika, 47(3):337-47. doi: 10.1007/BF02294164.

Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.

Schork NJ, Allison DB, & Thiel B (1996). Mixture Distributions in Human Genetics Research. Statistical Methods in Medical Research, 5:155-178. doi: 10.1177/096228029600500204.

Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48:465-471. doi: 10.1007/BF02293687.

Van Der Leeden R (1998). Multilevel Analysis of Repeated Measures Data. Quality & Quantity, 32(1):15-29.

Yahav I & Shmueli G (2012). On Generating Multivariate Poisson Data in Management Science Applications. Applied Stochastic Models in Business and Industry, 28(1):91-102. doi: 10.1002/asmb.901.

Yee TW (2017). VGAM: Vector Generalized Linear and Additive Models.
https://CRAN.R-project.org/package=VGAM.

Zhang X, Mallick H, & Yi N (2016). Zero-Inflated Negative Binomial Regression for Differential Abundance Testing in Microbiome Studies. Journal of Bioinformatics and Genomics 2(2):1-9. doi: 10.18454/jbg.2016.2.2.1.

See Also

find_constants, intercorr, checkpar, summary_sys

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
M <- 3
B <- calc_theory("Beta", c(4, 1.5))
skews <- lapply(seq_len(M), function(x) B[3])
skurts <- lapply(seq_len(M), function(x) B[4])
fifths <- lapply(seq_len(M), function(x) B[5])
sixths <- lapply(seq_len(M), function(x) B[6])
Six <- lapply(seq_len(M), function(x) list(0.03))
corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M,
  byrow = TRUE)
means <- lapply(seq_len(M), function(x) B[1])
vars <- lapply(seq_len(M), function(x) B[2]^2)
marginal <- list(0.3, 0.4, 0.5)
support <- lapply(seq_len(M), function(x) list(0:1))
corr.x <- list(list(matrix(1, 1, 1), matrix(0.4, 1, 1), matrix(0.4, 1, 1)),
  list(matrix(0.4, 1, 1), matrix(1, 1, 1), matrix(0.4, 1, 1)),
  list(matrix(0.4, 1, 1), matrix(0.4, 1, 1), matrix(1, 1, 1)))
betas <- list(0.5)
betas.t <- 1
betas.tint <- list(0.25)
Sys1 <- corrsys(10000, M, Time = 1:M, "Polynomial", "non_mix", means, vars,
  skews, skurts, fifths, sixths, Six, marginal = marginal, support = support,
  corr.x = corr.x, corr.e = corr.e, betas = betas, betas.t = betas.t,
  betas.tint = betas.tint, quiet = TRUE)

## Not run: 
seed <- 276
n <- 10000
M <- 3
Time <- 1:M

# Error terms have a beta(4, 1.5) distribution with an AR(1, p = 0.4)
# correlation structure
B <- calc_theory("Beta", c(4, 1.5))
skews <- lapply(seq_len(M), function(x) B[3])
skurts <- lapply(seq_len(M), function(x) B[4])
fifths <- lapply(seq_len(M), function(x) B[5])
sixths <- lapply(seq_len(M), function(x) B[6])
Six <- lapply(seq_len(M), function(x) list(0.03))
error_type <- "non_mix"
corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M,
  byrow = TRUE)

# 1 continuous mixture of Normal(-2, 1) and Normal(2, 1) for each Y
mix_pis <- lapply(seq_len(M), function(x) list(c(0.4, 0.6)))
mix_mus <- lapply(seq_len(M), function(x) list(c(-2, 2)))
mix_sigmas <- lapply(seq_len(M), function(x) list(c(1, 1)))
mix_skews <- lapply(seq_len(M), function(x) list(c(0, 0)))
mix_skurts <- lapply(seq_len(M), function(x) list(c(0, 0)))
mix_fifths <- lapply(seq_len(M), function(x) list(c(0, 0)))
mix_sixths <- lapply(seq_len(M), function(x) list(c(0, 0)))
mix_Six <- list()
Nstcum <- calc_mixmoments(mix_pis[[1]][[1]], mix_mus[[1]][[1]],
  mix_sigmas[[1]][[1]], mix_skews[[1]][[1]], mix_skurts[[1]][[1]],
  mix_fifths[[1]][[1]], mix_sixths[[1]][[1]])

means <- lapply(seq_len(M), function(x) c(Nstcum[1], B[1]))
vars <- lapply(seq_len(M), function(x) c(Nstcum[2]^2, B[2]^2))

# 1 binary variable for each Y
marginal <- lapply(seq_len(M), function(x) list(0.4))
support <- list(NULL, list(c(0, 1)), NULL)

# 1 Poisson variable for each Y
lam <- list(1, 5, 10)
# Y2 and Y3 are zero-inflated Poisson variables
p_zip <- list(NULL, 0.05, 0.1)

# 1 NB variable for each Y
size <- list(10, 15, 20)
prob <- list(0.3, 0.4, 0.5)
# either prob or mu is required (not both)
mu <- mapply(function(x, y) x * (1 - y)/y, size, prob, SIMPLIFY = FALSE)
# Y2 and Y3 are zero-inflated NB variables
p_zinb <- list(NULL, 0.05, 0.1)

# The 2nd (the normal mixture) variable is the same across Y
same.var <- 2

# Create the correlation matrix in terms of the components of the normal
# mixture
K <- 5
corr.x <- list()
corr.x[[1]] <- list(matrix(0.1, K, K), matrix(0.2, K, K), matrix(0.3, K, K))
diag(corr.x[[1]][[1]]) <- 1
# set correlation between components to 0
corr.x[[1]][[1]][2:3, 2:3] <- diag(2)
# set correlations with the same variable equal across outcomes
corr.x[[1]][[2]][, same.var] <- corr.x[[1]][[3]][, same.var] <-
  corr.x[[1]][[1]][, same.var]
corr.x[[2]] <- list(t(corr.x[[1]][[2]]), matrix(0.35, K, K),
  matrix(0.4, K, K))
  diag(corr.x[[2]][[2]]) <- 1
  corr.x[[2]][[2]][2:3, 2:3] <- diag(2)
corr.x[[2]][[2]][, same.var] <- corr.x[[2]][[3]][, same.var] <-
  t(corr.x[[1]][[2]][same.var, ])
corr.x[[2]][[3]][same.var, ] <- corr.x[[1]][[3]][same.var, ]
corr.x[[2]][[2]][same.var, ] <- t(corr.x[[2]][[2]][, same.var])
corr.x[[3]] <- list(t(corr.x[[1]][[3]]), t(corr.x[[2]][[3]]),
  matrix(0.5, K, K))
diag(corr.x[[3]][[3]]) <- 1
corr.x[[3]][[3]][2:3, 2:3] <- diag(2)
corr.x[[3]][[3]][, same.var] <- t(corr.x[[1]][[3]][same.var, ])
corr.x[[3]][[3]][same.var, ] <- t(corr.x[[3]][[3]][, same.var])

# The 2nd and 3rd variables of each Y are subject-level variables
subj.var <- matrix(c(1, 2, 1, 3, 2, 2, 2, 3, 3, 2, 3, 3), 6, 2, byrow = TRUE)
int.var <- tint.var <- NULL
betas.0 <- 0
betas <- list(seq(0.5, 0.5 + (K - 2) * 0.25, 0.25))
betas.subj <- list(seq(0.5, 0.5 + (K - 2) * 0.1, 0.1))
betas.int <- list()
betas.t <- 1
betas.tint <- list(c(0.25, 0.5))

method <- "Polynomial"

# Check parameter inputs
checkpar(M, method, error_type, means, vars, skews, skurts, fifths, sixths,
  Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
  mix_sixths, mix_Six, marginal, support, lam, p_zip, pois_eps = list(),
  size, prob, mu, p_zinb, nb_eps = list(), corr.x, corr.yx = list(),
  corr.e, same.var, subj.var, int.var, tint.var, betas.0, betas,
  betas.subj, betas.int, betas.t, betas.tint)

# Simulated system using correlation method 1
N <- corrsys(n, M, Time, method, error_type, means, vars, skews, skurts,
  fifths, sixths, Six, mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts,
  mix_fifths, mix_sixths, mix_Six, marginal, support, lam, p_zip, size,
  prob, mu, p_zinb, corr.x, corr.e, same.var, subj.var, int.var, tint.var,
  betas.0, betas, betas.subj, betas.int, betas.t, betas.tint, seed = seed,
  use.nearPD = FALSE)

# Summarize the results
S <- summary_sys(N$Y, N$E, E_mix = NULL, N$X, N$X_all, M, method, means,
  vars, skews, skurts, fifths, sixths, mix_pis, mix_mus, mix_sigmas,
  mix_skews, mix_skurts, mix_fifths, mix_sixths, marginal, support, lam,
  p_zip, size, prob, mu, p_zinb, corr.x, corr.e)

## End(Not run)

SimRepeat documentation built on May 2, 2019, 9:32 a.m.