R/nonnormsys.R

#' @title Generate Correlated Systems of Equations Containing Normal, Non-Normal, and Mixture Continuous Variables
#'
#' @description This function extends the techniques of Headrick and Beasley (2004, \doi{10.1081/SAC-120028431}) to
#'     create correlated systems of statistical equations containing continuous variables with normal,
#'     non-normal, or mixture distributions.  The method allows the user to control the distributions
#'     for the stochastic disturbance (error) terms \eqn{E} and independent variables \eqn{X}.  The user specifies the correlation structure
#'     between \eqn{X} terms within an outcome and across outcomes.  For a given equation, the user also specifies the correlation between
#'     the outcome \eqn{Y} and \eqn{X} terms.  These correlations are used to calculate the beta (slope) coefficients for the equations with
#'     \code{\link[SimRepeat]{calc_betas}}.  If the system contains mixture variables and \code{corr.yx} is specified in terms of non-mixture
#'     and mixture variables, the \code{betas} will be calculated in terms of non-mixture and mixture independent variables.  If \code{corr.yx}
#'     Finally, the user specifies the correlations across error terms.  The assumptions are that
#'     1) there are at least 2 equations and a total of at least 1 independent variable, 2) the independent variables are uncorrelated
#'     with the error terms, 3) each equation has an error term,
#'     and 4) all error terms have either a non-mixture or mixture distribution.  The outcomes \eqn{Y} are calculated as the \eqn{E} terms
#'     added to the products of the beta coefficients and the \eqn{X} terms.  There are no
#'     interactions between independent variables or distinction between subject and group-level terms (as in the hierarchical linear models approach
#'     of \code{\link[SimRepeat]{corrsys}} or \code{\link[SimRepeat]{corrsys2}}).  However, the user does not have to provide the beta coefficients
#'     (except for the intercepts).  Since the equations do not include random slopes (i.e. for the \eqn{X} terms), the effects of the independent
#'     variables are all considered "fixed."  However, a random intercept or a "time" effect with a random slope could be added by specifying
#'     them as independent variables.  There are no parameter input checks in order to decrease simulation time.  All inputs should be checked
#'     prior to simulation with \code{\link[SimRepeat]{checkpar}}.  Summaries of the simulation results can be found with \code{\link[SimRepeat]{summary_sys}}.
#'     The functions \code{\link[SimRepeat]{calc_corr_y}}, \code{\link[SimRepeat]{calc_corr_yx}}, and \code{\link[SimRepeat]{calc_corr_ye}} use equations
#'     from Headrick and Beasley (2004) to calculate the expected correlations for the outcomes, among a given outcome and covariates from
#'     the other outcomes, and for the error terms.  The vignette \strong{Theory and Equations for Correlated Systems of Continuous Variables}
#'     gives the equations, and the vignette \strong{Correlated Systems of Statistical Equations with Non-Mixture and Mixture Continuous
#'     Variables} gives examples.  There are also vignettes in \code{\link[SimCorrMix]{SimCorrMix}} which provide more details on continuous
#'     non-mixture and mixture variables.
#'
#' @section Generation of Continuous Non-Mixture and Mixture Variables:
#'     Mixture distributions describe random variables that are drawn from more than one component distribution.  For a random variable
#'     \eqn{X_{mix}} from a finite continuous mixture distribution with \eqn{k} components, the probability density function (PDF) can be
#'     described by:
#'
#'     \deqn{h_X(x) = \sum_{i=1}^{k} \pi_i f_{X_{comp_i}}(x), \sum_{i=1}^{k} \pi_i = 1.}
#'
#'     The \eqn{\pi_i} are mixing parameters which determine the weight of each component distribution \eqn{f_{X_{comp_i}}(x)} in the overall
#'     probability distribution.  As long as each component has a valid PDF, the overall distribution \eqn{h_X()} has a valid PDF.
#'     The main assumption is statistical independence between the process of randomly selecting the component distribution and the
#'     distributions themselves.  Simulation is done at the component-level for mixture variables.
#'
#'     All continuous variables are simulated using either Fleishman's third-order (\code{method} = "Fleishman",
#'     \doi{10.1007/BF02293811}) or Headrick's fifth-order (\code{method} = "Polynomial", \doi{10.1016/S0167-9473(02)00072-5})
#'     power method transformation (PMT).  It works by matching standardized cumulants -- the first four (mean,
#'     variance, skew, and standardized kurtosis) for Fleishman's method, or the first six (mean, variance,
#'     skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's method.  The transformation is expressed
#'     as follows:
#'
#'     \deqn{Y = c_{0} + c_{1} * Z + c_{2} * Z^2 + c_{3} * Z^3 + c_{4} * Z^4 + c_{5} * Z^5, Z \sim N(0,1),}
#'
#'     where \eqn{c_{4}} and \eqn{c_{5}} both equal \eqn{0} for Fleishman's method.  The real constants are calculated by
#'     \code{\link[SimMultiCorrData]{find_constants}} for non-mixture and components of mixture variables.  Continuous mixture
#'     variables are generated componentwise and then transformed to the desired mixture variables using random multinomial variables
#'     generated based on mixing probabilities.  The correlation matrices are specified in terms of correlations with components of the
#'     mixture variables.
#'
#' @section Choice of Fleishman's third-order or Headrick's fifth-order method:
#'     Using the fifth-order approximation allows additional control over the fifth and sixth moments of the generated distribution, improving
#'     accuracy.  In addition, the range of feasible standardized kurtosis values, given skew and standardized fifth (\eqn{\gamma_3}) and sixth
#'     (\eqn{\gamma_4}) cumulants, is larger than with Fleishman's method (see \code{\link[SimMultiCorrData]{calc_lower_skurt}}).
#'     For example, the Fleishman method can not be used to generate a non-normal distribution with a ratio of
#'     \eqn{\gamma_3^2/\gamma_4 > 9/14} (see Headrick & Kowalchuk, 2007).  This eliminates the Chi-squared family of distributions, which has
#'     a constant ratio of \eqn{\gamma_3^2/\gamma_4 = 2/3}.  The fifth-order method also generates more distributions with valid PDF's.
#'     However, if the fifth and sixth cumulants are unknown or do not exist, the Fleishman approximation should be used.
#'
#' @section Reasons for Function Errors:
#'     1) The most likely cause for function errors is that the parameter inputs are mispecified.  Using \code{\link[SimRepeat]{checkpar}}
#'     prior to simulation can help decrease these errors.
#'
#'     2) No solutions to \code{\link[SimMultiCorrData]{fleish}} or
#'     \code{\link[SimMultiCorrData]{poly}} converged when using \code{\link[SimMultiCorrData]{find_constants}}.  If this happens,
#'     the simulation will stop.  It may help to first use \code{\link[SimMultiCorrData]{find_constants}} for each continuous variable to
#'     determine if a sixth cumulant correction value is needed.  If the standardized cumulants are obtained from \code{calc_theory}, the user
#'     may need to use rounded values as inputs (i.e. \code{skews = round(skews, 8)}).  For example, in order to ensure that skew is exactly 0 for symmetric distributions.
#'
#'     3) The kurtosis for a continuous variable may be outside the region of possible values.  There is an associated lower kurtosis boundary for
#'     associated with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method).  Use
#'     \code{\link[SimMultiCorrData]{calc_lower_skurt}} to determine the boundary for a given set of cumulants.
#'
#'     4) No solutions to \code{\link[SimRepeat]{calc_betas}} converged when trying to find the beta coefficients.  Try different correlation
#'     matrices.
#'
#' @param n the sample size (i.e. the length of each simulated variable; default = 10000)
#' @param M the number of dependent variables \eqn{Y} (outcomes); equivalently, the number of equations in the system
#' @param method the PMT method used to generate all continuous variables, including independent variables (covariates) and error terms;
#'     "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation
#' @param error_type "non_mix" if all error terms have continuous non-mixture distributions, "mix" if all error terms have continuous mixture distributions
#' @param means a list of length \code{M} of vectors of means for the non-mixture (\eqn{X_{cont}}) and mixture (\eqn{X_{mix}}) independent variables
#'     and for the error terms (\eqn{E}); the order in each vector should be: \eqn{X_{cont}, X_{mix}, E} so that the order for
#'     \eqn{X_{cont}, X_{mix}} is
#'     the same as in \code{corr.x} (considering the components of mixture variables)
#' @param vars a list of length \code{M} of vectors of variances for \eqn{X_{cont}, X_{mix}, E}; same order and dimension as \code{means}
#' @param skews a list of length \code{M} of vectors of skew values for \eqn{X_{cont}} and \eqn{E} (if \code{error_type = "non_mix"});
#'     same order as in \code{corr.x} and \code{means}
#' @param skurts a list of length \code{M} of vectors of standardized kurtoses (kurtosis - 3) for \eqn{X_{cont}} and \eqn{E} (if \code{error_type = "non_mix"});
#'     same order and dimension as \code{skews}
#' @param fifths a list of length \code{M} of vectors of standardized fifth cumulants for \eqn{X_{cont}} and \eqn{E} (if \code{error_type = "non_mix"});
#'     same order and dimension as \code{skews}; not necessary for \code{method = "Fleishman"}
#' @param sixths a list of length \code{M} of vectors of standardized sixth cumulants for \eqn{X_{cont}} and \eqn{E} (if \code{error_type = "non_mix"});
#'     same order and dimension as \code{skews}; not necessary for \code{method = "Fleishman"}
#' @param Six a list of length \code{M}, where \code{Six[[p]][[j]]} is a vector of sixth cumulant correction values to aid in finding a valid PDF for
#'     \eqn{X_{cont(pj)}}, the j-th continuous non-mixture covariate for outcome \eqn{Y_p}; the last element of \code{Six[[p]]} is for \eqn{E_p}
#'     (if \code{error_type = "non_mix"}); use \code{Six[[p]][[j]] = NULL} if no correction desired for \eqn{X_{cont(pj)}};
#'     use \code{Six[[p]] = NULL} if no correction desired for any non-mixture covariate or error term in equation p;
#'     keep \code{Six = list()} if no corrections desired for all covariates or if \code{method = "Fleishman"}
#' @param mix_pis a list of length \code{M}, where \code{mix_pis[[p]][[j]]} is a vector of mixing probabilities that sum to 1 for \eqn{X_{mix(pj)}},
#'     the j-th continuous mixture covariate for outcome \eqn{Y_p}; the last element of \code{mix_pis[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_pis[[p]] = NULL}; components should be ordered as in \code{corr.x}
#' @param mix_mus a list of length \code{M}, where \code{mix_mus[[p]][[j]]} is a vector of means of the component distributions for
#'     \eqn{X_{mix(pj)}}; the last element of \code{mix_mus[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_mus[[p]] = NULL}
#' @param mix_sigmas a list of length \code{M}, where \code{mix_sigmas[[p]][[j]]} is a vector of standard deviations of the component distributions for
#'     \eqn{X_{mix(pj)}}; the last element of \code{mix_sigmas[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_sigmas[[p]] = NULL}
#' @param mix_skews a list of length \code{M}, where \code{mix_skews[[p]][[j]]} is a vector of skew values of the component distributions for
#'     \eqn{X_{mix(pj)}}; the last element of \code{mix_skews[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_skews[[p]] = NULL}
#' @param mix_skurts a list of length \code{M}, where \code{mix_skurts[[p]][[j]]} is a vector of standardized kurtoses of the component distributions for
#'     \eqn{X_{mix(pj)}}; the last element of \code{mix_skurts[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_skurts[[p]] = NULL}
#' @param mix_fifths a list of length \code{M}, where \code{mix_fifths[[p]][[j]]} is a vector of standardized fifth cumulants of the component distributions for
#'     \eqn{X_{mix(pj)}}; the last element of \code{mix_fifths[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_fifths[[p]] = NULL}; not necessary for \code{method = "Fleishman"}
#' @param mix_sixths a list of length \code{M}, where \code{mix_sixths[[p]][[j]]} is a vector of standardized sixth cumulants of the component distributions for
#'     \eqn{X_{mix(pj)}}; the last element of \code{mix_sixths[[p]]} is for \eqn{E_p} (if \code{error_type = "mix"});
#'     if \eqn{Y_p} has no mixture variables, use \code{mix_sixths[[p]] = NULL}; not necessary for \code{method = "Fleishman"}
#' @param mix_Six a list of length \code{M}, where \code{mix_Six[[p]]} is a list of length equal to the total number of component distributions for
#'     the \eqn{X_{mix(p)}} and \eqn{E_p} (if \code{error_type = "mix"});
#'     \code{mix_Six[[p]][[j]]} is a vector of sixth cumulant corrections for the j-th component distribution (i.e., if there are 2
#'     continuous mixture independent variables for \eqn{Y_p}, where \eqn{X_{mix(p1)}} has 2 components and \eqn{X_{mix(p2)}} has 3
#'     components, then \code{length(mix_Six[[p]]) = 5} and \code{mix_Six[[p]][[3]]} would correspond to the 1st component of
#'     \eqn{X_{mix(p2)}}); use \code{mix_Six[[p]][[j]] = NULL} if no correction desired for that component;
#'     use \code{mix_Six[[p]] = NULL} if no correction desired for any component of \eqn{X_{mix(p)}} and \eqn{E_p};
#'     keep \code{mix_Six = list()} if no corrections desired for all covariates or if \code{method = "Fleishman"}
#' @param corr.x list of length \code{M}, each component a list of length \code{M}; \code{corr.x[[p]][[q]]} is matrix of correlations
#'     for independent variables in equations p (\eqn{X_{(pj)}} for outcome \eqn{Y_p}) and q (\eqn{X_{(qj)}} for outcome \eqn{Y_q});
#'     order: 1st continuous non-mixture (same order as in \code{skews}) and 2nd components of continuous mixture (same order as in \code{mix_pis});
#'     if p = q, \code{corr.x[[p]][[q]]} is a correlation matrix with \code{nrow(corr.x[[p]][[q]])} = # of non-mixture + # of mixture components for outcome \eqn{Y_p};
#'     if p != q, \code{corr.x[[p]][[q]]} is a non-symmetric matrix of correlations where rows correspond to covariates for \eqn{Y_p}
#'     so that \code{nrow(corr.x[[p]][[q]])} = # of non-mixture + # of mixture components for outcome \eqn{Y_p} and
#'     columns correspond to covariates for \eqn{Y_q} so that \code{ncol(corr.x[[p]][[q]])} = # of non-mixture + # of mixture components for outcome \eqn{Y_q};
#'     use \code{corr.x[[p]][[q]] = NULL} if equation q has no \eqn{X_{(qj)}}; use \code{corr.x[[p]] = NULL} if equation p has no \eqn{X_{(pj)}}
#' @param corr.yx a list of length \code{M}, where the p-th component is a 1 row matrix of correlations between \eqn{Y_p} and \eqn{X_{(pj)}};
#'     if there are mixture variables and the \code{betas} are desired in terms of these (and not the components), then \code{corr.yx}
#'     should be specified in terms of correlations between outcomes and non-mixture or mixture variables, and the number of columns of the matrices
#'     of \code{corr.yx} should not match the dimensions of the matrices in \code{corr.x}; if the \code{betas} are desired in terms of
#'     the components, then \code{corr.yx} should be specified in terms of correlations between outcomes and non-mixture or components of
#'     mixture variables, and the number of columns of the matrices of \code{corr.yx} should match the dimensions of the matrices in \code{corr.x};
#'     use \code{corr.yx[[p]] = NULL} if equation p has no \eqn{X_{(pj)}}
#' @param corr.e correlation matrix for continuous non-mixture or components of mixture error terms
#' @param same.var either a vector or a matrix; if a vector, \code{same.var} includes column numbers of \code{corr.x[[1]][[1]]}
#'     corresponding to independent variables that should be identical across equations; these terms must have the same indices for all \code{p = 1, ..., M};
#'     i.e., if the 1st variable represents height which should be the same for each equation, then
#'     \code{same.var[1] = 1} and the 1st term for all other outcomes must also be height;
#'     if a matrix, columns 1 and 2 are outcome p and column index in \code{corr.x[[p]][[p]]} for 1st instance of variable,
#'     columns 3 and 4 are outcome q and column index in \code{corr.x[[q]][[q]]} for subsequent instances of variable; i.e., if
#'     1st term for all outcomes is height and \code{M = 3}, then \code{same.var = matrix(c(1,} \code{1, 2, 1, 1, 1, 3, 1), 2, 4, byrow = TRUE)}; the
#'     independent variable index corresponds to continuous non-mixture and component of continuous mixture covariate
#' @param betas.0 vector of length \code{M} containing intercepts, if \code{NULL} all set equal to 0; if length 1, all intercepts set to \code{betas.0}
#' @param seed the seed value for random number generation (default = 1234)
#' @param use.nearPD TRUE to convert the overall intermediate correlation matrix formed by the \eqn{X} (for all outcomes and independent
#'     variables) or \eqn{E} to the nearest positive definite matrix with \code{Matrix::nearPD} if necessary;
#'     if FALSE and \code{adjgrad = FALSE} the negative eigenvalues are replaced with \code{eigmin} if necessary
#' @param eigmin minimum replacement eigenvalue if overall intermediate correlation matrix is not positive-definite (default = 0)
#' @param adjgrad TRUE to use \code{adj_grad} to convert overall intermediate correlation matrix to a positive-definite matrix and next
#'     5 inputs can be used
#' @param B1 the initial matrix for algorithm; if NULL, uses a scaled initial matrix with diagonal elements \code{sqrt(nrow(Sigma))/2}
#' @param tau parameter used to calculate theta (default = 0.5)
#' @param tol maximum error for Frobenius norm distance between new matrix and original matrix (default = 0.1)
#' @param steps maximum number of steps for k (default = 100)
#' @param msteps maximum number of steps for m (default = 10)
#' @param errorloop if TRUE, uses \code{\link[SimCorrMix]{corr_error}} to attempt to correct the correlation of the independent
#'     variables within and across outcomes to be within \code{epsilon} of the target correlations \code{corr.x} until the number of iterations
#'     reaches \code{maxit} (default = FALSE)
#' @param epsilon the maximum acceptable error between the final and target correlation matrices (default = 0.001) in the error loop
#' @param maxit the maximum number of iterations to use (default = 1000) in the error loop
#' @param quiet if FALSE prints messages, if TRUE suppresses messages
#' @import SimMultiCorrData
#' @import SimCorrMix
#' @importFrom stats cor dbeta dbinom dchisq density dexp df dgamma dlnorm dlogis dmultinom dnbinom dnorm dpois dt dunif dweibull ecdf
#'     median pbeta pbinom pchisq pexp pf pgamma plnorm plogis pnbinom pnorm ppois pt punif pweibull qbeta qbinom qchisq qexp qf qgamma
#'     qlnorm qlogis qnbinom qnorm qpois qt quantile qunif qweibull rbeta rbinom rchisq rexp rf rgamma rlnorm rlogis rmultinom rnbinom
#'     rnorm rpois rt runif rweibull sd uniroot var
#' @import utils
#' @importFrom Matrix nearPD
#' @importFrom VGAM qzipois qzinegbin
#' @export
#' @return A list with the following components:
#'
#' @return \code{Y} matrix with \code{n} rows and \code{M} columns of outcomes
#'
#' @return \code{X} list of length \code{M} containing \eqn{X_{cont(pj)}, X_{comp(pj)}}
#'
#' @return \code{X_all} list of length \code{M} containing \eqn{X_{cont(pj)}, X_{mix(pj)}}
#'
#' @return \code{E} matrix with \code{n} rows containing continuous non-mixture or components of continuous mixture error terms
#'
#' @return \code{E_mix} matrix with \code{n} rows containing continuous mixture error terms
#'
#' @return \code{betas} a matrix of the slope coefficients calculated with \code{\link[SimRepeat]{calc_betas}}, rows represent the
#'     outcomes
#'
#' @return \code{constants} a list of length \code{M} with data.frames of the constants for the \eqn{X_{cont(pj)}},
#'     \eqn{X_comp(pj)} and \eqn{E_p}
#'
#' @return \code{SixCorr} a list of length \code{M} of lists of sixth cumulant correction values used to obtain valid
#'     \emph{pdf}'s for the \eqn{X_{cont(pj)}}, \eqn{X_comp(pj)}, and \eqn{E_p}
#'
#' @return \code{valid.pdf} a list of length \code{M} of vectors where the i-th element is "TRUE" if the constants for the i-th
#'         continuous variable generate a valid pdf, else "FALSE"
#'
#' @return \code{Sigma_X0} matrix of intermediate correlations calculated by \code{intercorr}
#'
#' @return \code{Sigma_X} matrix of intermediate correlations after \code{nearPD} or \code{adj_grad} function has been used;
#'     applied to generate the normal variables transformed to get the desired distributions
#'
#' @return \code{Error_Time} the time in minutes required to use the error loop
#'
#' @return \code{Time} the total simulation time in minutes
#'
#' @return \code{niter} a matrix of the number of iterations used in the error loop
#'
#' @keywords continuous mixture Headrick Beasley
#' @seealso \code{\link[SimRepeat]{calc_betas}}, \code{\link[SimRepeat]{calc_corr_y}}, \code{\link[SimRepeat]{calc_corr_yx}},
#'     \code{\link[SimRepeat]{calc_corr_ye}}, \code{\link[SimRepeat]{checkpar}}, \code{\link[SimRepeat]{summary_sys}}
#' @references
#' Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions.
#'     Computers & Mathematics with Applications, 15(10):819-28.
#'
#' Everitt BS (1996). An Introduction to Finite Mixture Distributions. Statistical Methods in Medical Research, 5(2):107-127. \doi{10.1177/096228029600500202}.
#'
#' Fialkowski AC (2017). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.1.
#'     \url{https://CRAN.R-project.org/package=SimMultiCorrData}.
#'
#' Fialkowski AC (2018). SimCorrMix: Simulation of Correlated Data of Multiple Variable Types including Continuous and Count
#'     Mixture Distributions. R package version 0.1.0. \url{https://github.com/AFialkowski/SimCorrMix}
#'
#' Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43:521-532. \doi{10.1007/BF02293811}.
#'
#' Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate
#'     Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. \doi{10.1016/S0167-9473(02)00072-5}.
#'     (\href{http://www.sciencedirect.com/science/article/pii/S0167947302000725}{ScienceDirect})
#'
#' Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions.
#'     Journal of Modern Applied Statistical Methods, 3(1):65-71. \doi{10.22237/jmasm/1083370080}.
#'
#' Headrick TC, Beasley TM (2004).  A Method for Simulating Correlated Non-Normal Systems of Linear Statistical Equations.
#'     Communications in Statistics - Simulation and Computation, 33(1).  \doi{10.1081/SAC-120028431}
#'
#' Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution
#'     Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. \doi{10.1080/10629360600605065}.
#'
#' Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power
#'     Method. Psychometrika, 64:25-35. \doi{10.1007/BF02294317}.
#'
#' Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using
#'     Mathematica. Journal of Statistical Software, 19(3):1 - 17. \cr \doi{10.18637/jss.v019.i03}.
#'
#' Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22:329-343.
#'
#' McCulloch CE, Searle SR, Neuhaus JM (2008). \emph{Generalized, Linear, and Mixed Models} (2nd ed.). Wiley Series in Probability and
#'     Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.
#'
#' Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.
#'
#' Schork NJ, Allison DB, & Thiel B (1996). Mixture Distributions in Human Genetics Research. Statistical Methods in Medical Research,
#'     5:155-178. \doi{10.1177/096228029600500204}.
#'
#' Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48:465-471. \doi{10.1007/BF02293687}.
#'
#' @examples
#' M <- 3
#' B <- calc_theory("Beta", c(4, 1.5))
#' skews <- lapply(seq_len(M), function(x) c(0, B[3]))
#' skurts <- lapply(seq_len(M), function(x) c(0, B[4]))
#' fifths <- lapply(seq_len(M), function(x) c(0, B[5]))
#' sixths <- lapply(seq_len(M), function(x) c(0, B[6]))
#' Six <- lapply(seq_len(M), function(x) list(NULL, 0.03))
#' corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M,
#'   byrow = TRUE)
#' means <- lapply(seq_len(M), function(x) c(0, B[1]))
#' vars <- lapply(seq_len(M), function(x) c(1, B[2]^2))
#' corr.x <- list(list(matrix(1, 1, 1), matrix(0.4, 1, 1), matrix(0.4, 1, 1)),
#'   list(matrix(0.4, 1, 1), matrix(1, 1, 1), matrix(0.4, 1, 1)),
#'   list(matrix(0.4, 1, 1), matrix(0.4, 1, 1), matrix(1, 1, 1)))
#' corr.yx <- list(matrix(0.4, 1), matrix(0.5, 1), matrix(0.6, 1))
#' Sys1 <- nonnormsys(10000, M, "Polynomial", "non_mix", means, vars,
#'   skews, skurts, fifths, sixths, Six, corr.x = corr.x, corr.yx = corr.yx,
#'   corr.e = corr.e)
#'
#' \dontrun{
#' # Example: system of three equations for 2 independent variables, where each
#' # error term has unit variance, from Headrick & Beasley (2002)
#' # Y_1 = beta_10 + beta_11 * X_11 + beta_12 * X_12 + sigma_1 * e_1
#' # Y_2 = beta_20 + beta_21 * X_21 + beta_22 * X_22 + sigma_2 * e_2
#' # Y_3 = beta_30 + beta_31 * X_31 + beta_32 * X_32 + sigma_3 * e_3
#'
#' # X_11 = X_21 = X_31 = Exponential(2)
#' # X_12 = X_22 = X_32 = Laplace(0, 1)
#' # e_1 = e_2 = e_3 = Cauchy(0, 1)
#'
#' seed <- 1234
#' M <- 3
#' Stcum1 <- calc_theory("Exponential", 2)
#' Stcum2 <- calc_theory("Laplace", c(0, 1))
#' Stcum3 <- c(0, 1, 0, 25, 0, 1500) # taken from paper
#' means <- lapply(seq_len(M), function(x) c(0, 0, 0))
#' vars <- lapply(seq_len(M), function(x) c(1, 1, 1))
#' skews <- lapply(seq_len(M), function(x) c(Stcum1[3], Stcum2[3], Stcum3[3]))
#' skurts <- lapply(seq_len(M), function(x) c(Stcum1[4], Stcum2[4], Stcum3[4]))
#' fifths <- lapply(seq_len(M), function(x) c(Stcum1[5], Stcum2[5], Stcum3[5]))
#' sixths <- lapply(seq_len(M), function(x) c(Stcum1[6], Stcum2[6], Stcum3[6]))
#'
#' # No sixth cumulant corrections will be used in order to match the results
#' # from the paper.
#'
#' corr.yx <- list(matrix(c(0.4, 0.4), 1), matrix(c(0.5, 0.5), 1),
#'   matrix(c(0.6, 0.6), 1))
#' corr.x <- list()
#' corr.x[[1]] <- corr.x[[2]] <- corr.x[[3]] <- list()
#' corr.x[[1]][[1]] <- matrix(c(1, 0.1, 0.1, 1), 2, 2)
#' corr.x[[1]][[2]] <- matrix(c(0.1974318, 0.1859656, 0.1879483, 0.1858601),
#'   2, 2, byrow = TRUE)
#' corr.x[[1]][[3]] <- matrix(c(0.2873190, 0.2589830, 0.2682057, 0.2589542),
#'   2, 2, byrow = TRUE)
#' corr.x[[2]][[1]] <- t(corr.x[[1]][[2]])
#' corr.x[[2]][[2]] <- matrix(c(1, 0.35, 0.35, 1), 2, 2)
#' corr.x[[2]][[3]] <- matrix(c(0.5723303, 0.4883054, 0.5004441, 0.4841808),
#'   2, 2, byrow = TRUE)
#' corr.x[[3]][[1]] <- t(corr.x[[1]][[3]])
#' corr.x[[3]][[2]] <- t(corr.x[[2]][[3]])
#' corr.x[[3]][[3]] <- matrix(c(1, 0.7, 0.7, 1), 2, 2)
#' corr.e <- matrix(0.4, nrow = 3, ncol = 3)
#' diag(corr.e) <- 1
#'
#' # Check the parameter inputs
#' checkpar(M, "Polynomial", "non_mix", means, vars, skews,
#'   skurts, fifths, sixths, corr.x = corr.x, corr.yx = corr.yx,
#'   corr.e = corr.e)
#' # Generate the system
#' Sys1 <- nonnormsys(10000, M, "Polynomial", "non_mix", means, vars, skews,
#'   skurts, fifths, sixths, corr.x = corr.x, corr.yx = corr.yx,
#'   corr.e = corr.e, seed = seed)
#' # Summarize the results
#' Sum1 <- summary_sys(Sys1$Y, Sys1$E, E_mix = NULL, Sys1$X, X_all = list(), M,
#'   "Polynomial", means, vars, skews, skurts, fifths, sixths, corr.x = corr.x,
#'   corr.e = corr.e)
#'
#' # Calculate theoretical correlations for comparison to simulated values
#' calc_corr_y(Sys1$betas, corr.x, corr.e, vars)
#' Sum1$rho.y
#' calc_corr_ye(Sys1$betas, corr.x, corr.e, vars)
#' Sum1$rho.ye
#' calc_corr_yx(Sys1$betas, corr.x, vars)
#' Sum1$rho.yx
#' }
nonnormsys <- function(n = 10000, M = NULL,
                       method = c("Fleishman", "Polynomial"),
                       error_type = c("non_mix", "mix"),
                       means = list(), vars = list(), skews = list(),
                       skurts = list(), fifths = list(), sixths = list(),
                       Six = list(), mix_pis = list(), mix_mus = list(),
                       mix_sigmas = list(), mix_skews =  list(),
                       mix_skurts =  list(), mix_fifths =  list(),
                       mix_sixths =  list(), mix_Six = list(),
                       same.var = NULL, betas.0 = NULL, corr.x = list(),
                       corr.yx = list(), corr.e = NULL, seed = 1234,
                       use.nearPD = TRUE, eigmin = 0, adjgrad = FALSE,
                       B1 = NULL, tau = 0.5, tol = 0.1, steps = 100,
                       msteps = 10, errorloop = FALSE, epsilon = 0.001,
                       maxit = 1000, quiet = FALSE) {
  start.time <- Sys.time()
  if (length(error_type) != 1)
    stop("Please choose one type of distribution for all of the error terms:
         mix if all errors have continuous mixture distributions,
         non_mix if all errors have continuous non-mixture distributions.")
  K.mix <- rep(0, M)
  K.comp <- rep(0, M)
  K.error <- rep(0, M)
  if (length(mix_pis) > 0) {
    K.mix <- lengths(mix_pis)
    K.comp <- sapply(lapply(mix_pis, unlist), length)
  }
  k.comp <- c(0, cumsum(K.comp))
  K.cont <- lengths(vars) - K.mix
  k.cont <- c(0, cumsum(K.cont))
  k.mix <- c(0, cumsum(K.mix))
  if (error_type == "mix") {
    K.error <- sapply(mix_pis, function(x) lengths(x[length(x)]))
    k.error <- c(0, cumsum(K.error))
    K.x <- K.cont + K.comp - K.error
    K.mix2 <- K.mix - 1
  } else {
    K.x <- K.cont + K.comp - 1
    K.mix2 <- K.mix
  }
  if (is.null(betas.0)) betas.0 <- rep(0, M)
  if (length(betas.0) == 1) betas.0 <- rep(betas.0, M)
  if (length(betas.0) != M)
    stop("Length of intercept vector betas.0 should be M!")
  same.cont <- NULL
  if (!is.null(same.var)) {
    if (class(same.var) == "numeric") {
      for (j in 1:length(same.var)) {
        same.cont <- rbind(same.cont, cbind(rep(1, M - 1),
          rep(same.var[j], M - 1), 2:M, rep(same.var[j], M - 1)))
      }
      same.var <- same.cont
    } else same.cont <- same.var
  }
  csame.dist <- NULL
  csame.dist2 <- NULL
  skews2 <- NULL
  skurts2 <- NULL
  fifths2 <- NULL
  sixths2 <- NULL
  Six2 <- list()
  if (length(skews) > 0) {
    skews2 <- unlist(skews)
    skurts2 <- unlist(skurts)
    if (method == "Polynomial") {
      fifths2 <- unlist(fifths)
      sixths2 <- unlist(sixths)
      Six2 <- unlist(Six, recursive = FALSE)
    }
    if (length(skews2) > 1) {
      for (i in 2:length(skews2)) {
        if (skews2[i] %in% skews2[1:(i - 1)]) {
          csame <- which(skews2[1:(i - 1)] == skews2[i])
          for (j in 1:length(csame)) {
            if (method == "Polynomial") {
              if ((skurts2[i] == skurts2[csame[j]]) &
                  (fifths2[i] == fifths2[csame[j]]) &
                  (sixths2[i] == sixths2[csame[j]])) {
                csame.dist <- rbind(csame.dist, c(csame[j], i))
                break
              }
            }
            if (method == "Fleishman") {
              if (skurts2[i] == skurts2[csame[j]]) {
                csame.dist <- rbind(csame.dist, c(csame[j], i))
                break
              }
            }
          }
        }
      }
      if (!is.null(csame.dist)) {
        ind.p <- sapply(csame.dist[, 1],
                        function(x) min(which(k.cont >= x)) - 1)
        ind.q <- sapply(csame.dist[, 2],
                        function(x) min(which(k.cont >= x)) - 1)
        ind.p2 <- numeric(length(ind.p))
        ind.q2 <- numeric(length(ind.q))
        for (i in 1:length(ind.p)) {
          ind.p2[i] <- csame.dist[i, 1] - k.cont[ind.p[i]]
          ind.q2[i] <- csame.dist[i, 2] - k.cont[ind.q[i]]
        }
        csame.dist2 <- cbind(ind.p, ind.p2, ind.q, ind.q2)
      }
    }
  }
  msame.dist2 <- NULL
  if (length(mix_pis) > 0) {
    mix_skews2 <- unlist(mix_skews)
    mix_skurts2 <- unlist(mix_skurts)
    if (method == "Polynomial") {
      mix_fifths2 <- unlist(mix_fifths)
      mix_sixths2 <- unlist(mix_sixths)
    }
    for (i in 1:length(mix_skews2)) {
      msame.d <- NULL
      if (length(skews) > 0) {
        if (mix_skews2[i] %in% skews2) {
          msame <- which(skews2 == mix_skews2[i])
          for (j in 1:length(msame)) {
            if (method == "Polynomial") {
              if ((mix_skurts2[i] == skurts2[msame[j]]) &
                  (mix_fifths2[i] == fifths2[msame[j]]) &
                  (mix_sixths2[i] == sixths2[msame[j]])) {
                msame.d <- c(msame[j], i)
                msame.d <- c(min(which(k.cont >= msame.d[1])) - 1,
                  msame.d[1] - k.cont[min(which(k.cont >= msame.d[1])) - 1],
                  min(which(k.comp >= msame.d[2])) - 1,
                  msame.d[2] - k.comp[min(which(k.comp >= msame.d[2])) - 1])
                msame.d[4] <- msame.d[4] + K.cont[msame.d[3]]
                break
              }
            }
            if (method == "Fleishman") {
              if (mix_skurts2[i] == skurts2[msame[j]]) {
                msame.d <- c(msame[j], i)
                msame.d <- c(min(which(k.cont >= msame.d[1])) - 1,
                  msame.d[1] - k.cont[min(which(k.cont >= msame.d[1])) - 1],
                  min(which(k.comp >= msame.d[2])) - 1,
                  msame.d[2] - k.comp[min(which(k.comp >= msame.d[2])) - 1])
                msame.d[4] <- msame.d[4] + K.cont[msame.d[3]]
                break
              }
            }
          }
        }
      }
      if (is.null(msame.d) & i >= 2) {
        if (mix_skews2[i] %in% mix_skews2[1:(i-1)]) {
          msame <- which(mix_skews2[1:(i - 1)] == mix_skews2[i])
          for (j in 1:length(msame)) {
            if (method == "Polynomial") {
              if ((mix_skurts2[i] == mix_skurts2[msame[j]]) &
                  (mix_fifths2[i] == mix_fifths2[msame[j]]) &
                  (mix_sixths2[i] == mix_sixths2[msame[j]])) {
                msame.d <- c(msame[j], i)
                msame.d <- c(min(which(k.comp >= msame.d[1])) - 1,
                  msame.d[1] - k.comp[min(which(k.comp >= msame.d[1])) - 1],
                  min(which(k.comp >= msame.d[2])) - 1,
                  msame.d[2] - k.comp[min(which(k.comp >= msame.d[2])) - 1])
                msame.d[2] <- msame.d[2] + K.cont[msame.d[1]]
                msame.d[4] <- msame.d[4] + K.cont[msame.d[3]]
                break
              }
            }
            if (method == "Fleishman") {
              if (mix_skurts2[i] == mix_skurts2[msame[j]]) {
                msame.d <- c(msame[j], i)
                msame.d <- c(min(which(k.comp >= msame.d[1])) - 1,
                  msame.d[1] - k.comp[min(which(k.comp >= msame.d[1])) - 1],
                  min(which(k.comp >= msame.d[2])) - 1,
                  msame.d[2] - k.comp[min(which(k.comp >= msame.d[2])) - 1])
                msame.d[2] <- msame.d[2] + K.cont[msame.d[1]]
                msame.d[4] <- msame.d[4] + K.cont[msame.d[3]]
                break
              }
            }
          }
        }
      }
      msame.dist2 <- rbind(msame.dist2, msame.d)
    }
    mix_skews2 <- lapply(mix_skews, unlist)
    mix_skurts2 <- lapply(mix_skurts, unlist)
    if (method == "Polynomial") {
      mix_fifths2 <- lapply(mix_fifths, unlist)
      mix_sixths2 <- lapply(mix_sixths, unlist)
    }
  }
  constants <- list()
  SixCorr <- list()
  Valid.PDF <- list()
  for (i in 1:M) {
    SixCorr[[i]] <- numeric(K.cont[i] + K.comp[i])
    Valid.PDF[[i]] <- numeric(K.cont[i] + K.comp[i])
    if (method == "Fleishman") {
      constants[[i]] <- matrix(NA, nrow = K.cont[i] + K.comp[i], ncol = 4)
      colnames(constants[[i]]) <- c("c0", "c1", "c2", "c3")
    }
    if (method == "Polynomial") {
      constants[[i]] <- matrix(NA, nrow = K.cont[i] + K.comp[i], ncol = 6)
      colnames(constants[[i]]) <- c("c0", "c1", "c2", "c3", "c4", "c5")
    }
    if (K.cont[i] > 0) {
      for (j in 1:K.cont[i]) {
        if (!is.null(csame.dist2)) {
          rind <- which(csame.dist2[, 3] == i & csame.dist2[, 4] == j)
          if (length(rind) > 0) {
            constants[[i]][j, ] <-
              constants[[csame.dist2[rind, 1]]][csame.dist2[rind, 2], ]
            SixCorr[[i]][j] <-
              SixCorr[[csame.dist2[rind, 1]]][csame.dist2[rind, 2]]
            Valid.PDF[[i]][j] <-
              Valid.PDF[[csame.dist2[rind, 1]]][csame.dist2[rind, 2]]
          }
        }
        if (sum(is.na(constants[[i]][j, ])) > 0) {
          if (length(Six) == 0) Six2 <- NULL else
            if (length(Six[[i]]) == 0) Six2 <- NULL else
              Six2 <- Six[[i]][[j]]
          cons <-
            suppressWarnings(find_constants(method, skews[[i]][j],
              skurts[[i]][j], fifths[[i]][j], sixths[[i]][j],
              Six = Six2, n = 25, seed = seed))
          if (length(cons) == 1 | is.null(cons)) {
            stop(paste("Constants can not be found for outcome ", i,
                       " non-mixture variable ", j, ".", sep = ""))
          }
          SixCorr[[i]][j] <- ifelse(is.null(cons$SixCorr1), NA, cons$SixCorr1)
          Valid.PDF[[i]][j] <- cons$valid
          constants[[i]][j, ] <- cons$constants
        }
      }
    }
  }
  for (i in 1:M) {
    if (K.mix[i] > 0) {
      for (j in (K.cont[i] + 1):(K.cont[i] + K.comp[i])) {
        if (!is.null(msame.dist2)) {
          rind <- which(msame.dist2[, 3] == i & msame.dist2[, 4] == j)
          if (length(rind) > 0) {
            constants[[i]][j, ] <-
              constants[[msame.dist2[rind, 1]]][msame.dist2[rind, 2], ]
            SixCorr[[i]][j] <-
              SixCorr[[msame.dist2[rind, 1]]][msame.dist2[rind, 2]]
            Valid.PDF[[i]][j] <-
              Valid.PDF[[msame.dist2[rind, 1]]][msame.dist2[rind, 2]]
          }
        }
        if (sum(is.na(constants[[i]][j, ])) > 0) {
          if (length(mix_Six) == 0) mix_Six2 <- NULL else
            if (length(mix_Six[[i]]) == 0) mix_Six2 <- NULL else
              mix_Six2 <- mix_Six[[i]][[j - K.cont[i]]]
          cons <-
            suppressWarnings(find_constants(method,
              mix_skews2[[i]][j - K.cont[i]], mix_skurts2[[i]][j - K.cont[i]],
              mix_fifths2[[i]][j - K.cont[i]], mix_sixths2[[i]][j - K.cont[i]],
              Six = mix_Six2, n = 25, seed = seed))
          if (length(cons) == 1 | is.null(cons)) {
            stop(paste("Constants can not be found for outcome ", i,
                 " mixture component ", j - K.cont[i], ".", sep = ""))
          }
          SixCorr[[i]][j] <- ifelse(is.null(cons$SixCorr1), NA, cons$SixCorr1)
          Valid.PDF[[i]][j] <- cons$valid
          constants[[i]][j, ] <- cons$constants
        }
      }
    }
  }
  if (error_type == "non_mix" & max(K.mix2) > 0) {
    for (i in 1:M) {
      constants[[i]] <- rbind(constants[[i]][-K.cont[i], ],
        constants[[i]][K.cont[i], ])
    }
  }
  e.constants <- NULL
  if (error_type == "non_mix") {
    e.means <- mapply('[[', means, lengths(means))
    e.vars <- mapply('[[', vars, lengths(vars))
    for (i in 1:M) {
      e.constants <- rbind(e.constants, constants[[i]][nrow(constants[[i]]), ])
    }
  } else {
    e.means <- unlist(mapply('[', mix_mus, lengths(mix_mus)))
    e.vars <- (unlist(mapply('[', mix_sigmas, lengths(mix_sigmas))))^2
    for (i in 1:M) {
      e.constants <- rbind(e.constants,
        constants[[i]][(nrow(constants[[i]]) - K.error[i] +
                          1):nrow(constants[[i]]), ])
    }
  }
  Sigma_E <- intercorr_cont(method, e.constants, corr.e)
  set.seed(seed)
  if (!is.null(same.var)) {
    Z <- matrix(rnorm((ncol(Sigma_E) + sum(K.x) - nrow(same.var)) * n), n)
  } else Z <- matrix(rnorm((ncol(Sigma_E) + sum(K.x)) * n), n)
  Z <- scale(Z, TRUE, FALSE)
  Z <- Z %*% svd(Z, nu = 0)$v
  Z <- scale(Z, FALSE, TRUE)
  if (min(eigen(Sigma_E, symmetric = TRUE)$values) < 0) {
    if (quiet == FALSE)
      message("Intermediate E correlation matrix is not positive definite.")
    if (use.nearPD == TRUE) {
      Sigma_E <- as.matrix(nearPD(Sigma_E, corr = T, keepDiag = T)$mat)
    } else if (adjgrad == TRUE) {
      sadj <- adj_grad(Sigma_E, B1, tau, tol, steps, msteps)
      Sigma_E <- sadj$Sigma2
    }
  }
  eig <- eigen(Sigma_E, symmetric = TRUE)
  sqrteigval <- diag(sqrt(pmax(eig$values, 0)), nrow(Sigma_E))
  eigvec <- eig$vectors
  fry <- eigvec %*% sqrteigval
  E <- fry %*% t(Z[, (ncol(Z) - ncol(Sigma_E) + 1):ncol(Z)])
  E <- t(E)
  E2 <- matrix(1, nrow = n, ncol = ncol(Sigma_E))
  for (i in 1:ncol(Sigma_E)) {
    if (method == "Fleishman") {
      E2[, i] <- e.constants[i, 1] + e.constants[i, 2] * E[, i] +
        e.constants[i, 3] * E[, i]^2 + e.constants[i, 4] * E[, i]^3
    }
    if (method == "Polynomial") {
      E2[, i] <- e.constants[i, 1] + e.constants[i, 2] * E[, i] +
        e.constants[i, 3] * E[, i]^2 + e.constants[i, 4] * E[, i]^3 +
        e.constants[i, 5] * E[, i]^4 + e.constants[i, 6] * E[, i]^5
    }
    E2[, i] <- e.means[i] + sqrt(e.vars[i]) * E2[, i]
  }
  E_comp <- E2
  colnames(E_comp) <- paste("E", 1:ncol(E_comp), sep = "")
  if (error_type == "mix") {
    E2 <- matrix(1, n, M)
    for (i in 1:M) {
      seed <- seed + 1
      set.seed(seed)
      R <- rmultinom(n, size = 1, prob = mix_pis[[i]][[length(mix_pis[[i]])]])
      E2[, i] <- apply(t(R) * E_comp[, (k.error[i] + 1):k.error[i + 1]], 1,
                       sum)
      E2[, i] <- scale(E2[, i])
      E2[, i] <- means[[i]][length(means[[i]])] +
        sqrt(vars[[i]][length(vars[[i]])]) * E2[, i]
    }
  }
  colnames(E2) <- paste("E", 1:M, sep = "")
  constants0 <- constants
  constants2 <- NULL
  for (p in 1:M) {
    if (K.x[p] == 0) next
    rownames(corr.x[[p]][[p]]) <- paste("cont", p, "_", 1:K.x[p], sep = "")
    if (!is.null(same.cont)) {
      if (p %in% same.cont[, 3])
        constants[[p]] <-
          constants[[p]][-(same.cont[same.cont[, 3] == p, 4]), , drop = FALSE]
    }
    if (error_type == "non_mix") {
      constants2 <- rbind(constants2,
        constants[[p]][-nrow(constants[[p]]), , drop = FALSE])
    } else {
      constants2 <- rbind(constants2,
        constants[[p]][-c((nrow(constants[[p]]) - K.error[p] +
                             1):nrow(constants[[p]])), , drop = FALSE])
    }
    if (!is.null(same.var)) {
      same.var0 <- same.var[same.var[, 3] == p, , drop = FALSE]
      if (p != 1 & nrow(same.var0) != 0) {
        q <- unique(same.var0[, 1])
        for (i in 1:length(q)) {
          rownames(corr.x[[p]][[p]])[same.var0[same.var0[, 1] == q[i], 4]] <-
            rownames(corr.x[[q[i]]][[q[i]]])[same.var0[same.var0[, 1] == q[i],
                                                       2]]
        }
      }
    }
    colnames(corr.x[[p]][[p]]) <- rownames(corr.x[[p]][[p]])
  }
  names0 <- list()
  for (p in 1:M) {
    if (length(corr.x[[p]]) == 0) {
      names0 <- append(names0, list(NULL))
      next
    } else {
      names0[[p]] <- colnames(corr.x[[p]][[p]])
      for (q in 1:M) {
        if (length(corr.x[[p]][[q]]) == 0) next
        colnames(corr.x[[p]][[q]]) <- colnames(corr.x[[q]][[q]])
        rownames(corr.x[[p]][[q]]) <- rownames(corr.x[[p]][[p]])
      }
    }
  }
  corr.x0 <- corr.x
  Corr_X <- list()
  if (!is.null(same.var)) {
    for (p in 1:(M - 1)) {
      if (K.x[p] == 0) next
      for (q in (p + 1):M) {
        if (K.x[q] == 0) next
        same.var0 <- same.var[(same.var[, 1] == p & same.var[, 3] == q), ,
                              drop = FALSE]
        if (nrow(same.var0) != 0) {
          for (i in 1:M) {
            if (K.x[i] == 0) next
            for (j in 1:M) {
              if (K.x[j] == 0) next
              if (i == q & j != q)
                corr.x0[[i]][[j]] <- corr.x0[[i]][[j]][-same.var0[, 4], ,
                                                       drop = FALSE]
              if (i != q & j == q)
                corr.x0[[i]][[j]] <- corr.x0[[i]][[j]][, -same.var0[, 4],
                                                       drop = FALSE]
              if (i == q & j == q)
                corr.x0[[i]][[j]] <- corr.x0[[i]][[j]][-same.var0[, 4],
                  -same.var0[, 4], drop = FALSE]
            }
          }
        }
      }
    }
  }
  for (p in 1:M) {
    Corr_X <- append(Corr_X, list(NULL))
    if (length(corr.x0[[p]]) > 0) {
      for (q in 1:M) {
        if (length(corr.x0[[p]][[q]]) == 0) next else
          if (nrow(corr.x0[[p]][[q]]) == 0 |
              ncol(corr.x0[[p]][[q]]) == 0) next else
                Corr_X[[p]] <- cbind(Corr_X[[p]], corr.x0[[p]][[q]])
      }
    }
  }
  Corr_X <- do.call(rbind, Corr_X)
  names1 <- colnames(Corr_X)
  Sigma_X <- intercorr_cont(method = method, constants = constants2,
                            rho_cont = Corr_X)
  colnames(Sigma_X) <- colnames(Corr_X)
  rownames(Sigma_X) <- rownames(Corr_X)
  Sigma_X0 <- Sigma_X
  if (min(eigen(Sigma_X, symmetric = TRUE)$values) < 0) {
    if (quiet == FALSE)
      message("Intermediate correlation matrix is not positive definite.")
    if (use.nearPD == TRUE) {
      Sigma_X <- as.matrix(nearPD(Sigma_X, corr = T, keepDiag = T)$mat)
    } else if (adjgrad == TRUE) {
      sadj <- adj_grad(Sigma_X, B1, tau, tol, steps, msteps)
      Sigma_X <- sadj$Sigma2
    }
  }
  eig <- eigen(Sigma_X, symmetric = TRUE)
  sqrteigval <- diag(sqrt(pmax(eig$values, 0)), nrow(Sigma_X))
  eigvec <- eig$vectors
  fry <- eigvec %*% sqrteigval
  X <- fry %*% t(Z[, 1:ncol(Sigma_X), drop = FALSE])
  X <- t(X)
  colnames(X) <- colnames(Corr_X)
  colnames(Sigma_X) <- colnames(Corr_X)
  rownames(Sigma_X) <- rownames(Corr_X)
  X_cont <- list()
  for (i in 1:M) {
    if (K.x[i] != 0) {
      if (i == 1) {
        X_cont[[i]] <- X[, 1:K.x[i], drop = FALSE]
      } else {
        if (!is.null(same.var)) {
          if (K.x[i] == nrow(same.var[same.var[, 3] == i, , drop = FALSE])) {
            X_cont <- append(X_cont, list(NULL))
          } else {
            X_cont[[i]] <- X[, (cumsum(K.x)[i - 1] -
              nrow(same.var[same.var[, 3] <= (i - 1), , drop = FALSE]) +
              1):(cumsum(K.x)[i] - nrow(same.var[same.var[, 3] <= i, ,
                                               drop = FALSE])), drop = FALSE]
          }
        } else {
          X_cont[[i]] <- X[, (cumsum(K.x)[i - 1] + 1):cumsum(K.x)[i],
                       drop = FALSE]
        }
      }
    } else X_cont <- append(X_cont, list(NULL))
  }
  Y_cont <- list()
  means2 <- list()
  vars2 <- list()
  temp_means <- NULL
  temp_vars <- NULL
  for (m in 1:M) {
    Y_cont <- append(Y_cont, list(NULL))
    means2 <- append(means2, list(NULL))
    vars2 <- append(vars2, list(NULL))
    if (K.x[m] == 0 | length(grep("cont", colnames(X_cont[[m]]))) == 0)
      next else {
      if (error_type == "non_mix") {
        if ((K.cont[m] - 1) > 0) {
          means2[[m]] <- means[[m]][1:(K.cont[m] - 1)]
          vars2[[m]] <- vars[[m]][1:(K.cont[m] - 1)]
        }
        if (K.mix2[m] > 0) {
          means2[[m]] <- c(means2[[m]], unlist(mix_mus[[m]]))
          vars2[[m]] <- c(vars2[[m]], (unlist(mix_sigmas[[m]]))^2)
        }
      }
      if (error_type == "mix") {
        if (K.cont[m] > 0) {
          means2[[m]] <- means[[m]][1:K.cont[m]]
          vars2[[m]] <- vars[[m]][1:K.cont[m]]
        }
        if (K.mix2[m] > 0) {
          means2[[m]] <- c(means2[[m]],
            unlist(mix_mus[[m]][-length(mix_mus[[m]])]))
          vars2[[m]] <- c(vars2[[m]],
            (unlist(mix_sigmas[[m]][-length(mix_sigmas[[m]])]))^2)
        }
      }
      if (!is.null(same.cont)) {
        if (m %in% same.cont[, 3]) {
          means2[[m]] <- means2[[m]][-(same.cont[same.cont[, 3] == m, 4])]
          vars2[[m]] <- vars2[[m]][-(same.cont[same.cont[, 3] == m, 4])]
        }
      }
      Y_cont[[m]] <- matrix(1, nrow = n, ncol = ncol(X_cont[[m]]))
      for (i in 1:ncol(X_cont[[m]])) {
        if (method == "Fleishman") {
          Y_cont[[m]][, i] <- constants[[m]][i, 1] +
            constants[[m]][i, 2] * X_cont[[m]][, i] +
            constants[[m]][i, 3] * X_cont[[m]][, i]^2 +
            constants[[m]][i, 4] * X_cont[[m]][, i]^3
        }
        if (method == "Polynomial") {
          Y_cont[[m]][, i] <- constants[[m]][i, 1] +
            constants[[m]][i, 2] * X_cont[[m]][, i] +
            constants[[m]][i, 3] * X_cont[[m]][, i]^2 +
            constants[[m]][i, 4] * X_cont[[m]][, i]^3 +
            constants[[m]][i, 5] * X_cont[[m]][, i]^4 +
            constants[[m]][i, 6] * X_cont[[m]][, i]^5
        }
        Y_cont[[m]][, i] <- means2[[m]][i] +
          sqrt(vars2[[m]][i]) * Y_cont[[m]][, i]
      }
      colnames(Y_cont[[m]]) <- colnames(X_cont[[m]])
      temp_means <- c(temp_means, means2[[m]])
      temp_vars <- c(temp_vars, vars2[[m]])
    }
  }
  Y_all <- NULL
  for (p in 1:M) {
    Y_all <- cbind(Y_all, Y_cont[[p]])
  }
  rho.x0 <- cor(Y_all)
  emax0 <- max(abs(rho.x0 - Corr_X))
  niter <- diag(0, nrow(rho.x0), ncol(rho.x0))
  start.time.error <- Sys.time()
  if (emax0 > epsilon & errorloop == TRUE) {
    EL <- corr_error(n = n, k_cont = length(grep("cont", colnames(Y_all))),
      method = method, means = temp_means, vars = temp_vars,
      constants = constants2, seed = seed, epsilon = epsilon, maxit = maxit,
      rho0 = Corr_X, Sigma = Sigma_X, rho_calc = rho.x0)
    Y_all <- EL$Y_cont
    colnames(Y_all) <- colnames(Corr_X)
    niter <- EL$niter
    colnames(niter) <- colnames(Corr_X)
    rownames(niter) <- colnames(Corr_X)
    Sigma_X <- EL$Sigma
  }
  stop.time.error <- Sys.time()
  Y_cont <- list()
  for (i in 1:M) {
    if (K.x[i] != 0) {
      if (i == 1) {
        Y_cont[[i]] <- Y_all[, 1:K.x[i], drop = FALSE]
      } else {
        if (!is.null(same.var)) {
          if (K.x[i] == nrow(same.var[same.var[, 3] == i, , drop = FALSE])) {
            Y_cont <- append(Y_cont, list(NULL))
          } else {
            Y_cont[[i]] <- Y_all[, (cumsum(K.x)[i - 1] -
              nrow(same.var[same.var[, 3] <= (i - 1), , drop = FALSE]) +
              1):(cumsum(K.x)[i] - nrow(same.var[same.var[, 3] <= i, ,
              drop = FALSE])), drop = FALSE]
          }
        } else {
          Y_cont[[i]] <- Y_all[, (cumsum(K.x)[i - 1] + 1):cumsum(K.x)[i],
                           drop = FALSE]
        }
      }
    } else Y_cont <- append(Y_cont, list(NULL))
  }
  for (m in 1:M) {
    if (K.x[m] > 0) {
      if (!is.null(same.cont)) {
        if (nrow(same.cont[same.cont[, 3] == m, , drop = FALSE]) > 0) {
          temp.cont <- same.cont[same.cont[, 3] == m, , drop = FALSE]
          for (i in 1:nrow(temp.cont)) {
            Y_cont[[m]] <- cbind(Y_cont[[m]],
                                 Y_cont[[temp.cont[i, 1]]][, temp.cont[i, 2]])
            colnames(Y_cont[[m]])[ncol(Y_cont[[m]])] <-
              colnames(Y_cont[[temp.cont[i, 1]]])[temp.cont[i, 2]]
          }
        }
      }
      if (length(Y_cont[[m]]) > 0) {
        Y_cont[[m]] <- Y_cont[[m]][, names0[[m]], drop = FALSE]
      }
    }
  }
  Y_comp <- Y_cont
  if (max(K.mix2) > 0) {
    for (i in 1:M) {
      if (K.mix2[i] != 0) {
        if ((ncol(Y_cont[[i]]) - K.comp[i] + K.error[i]) > 0) {
          Y_cont[[i]] <- cbind(Y_cont[[i]][, 1:(ncol(Y_cont[[i]]) -
            K.comp[i] + K.error[i]), drop = FALSE], matrix(1, n, K.mix2[i]))
        } else {
          Y_cont[[i]] <- matrix(1, n, K.mix2[i])
        }
        for (j in 1:K.mix2[i]) {
          ind <- K.x[i] - K.comp[i] + K.error[i] + j
          ind2 <- c(0, cumsum(sapply(mix_pis[[i]], length)))
          if (!is.null(same.cont)) {
            if (nrow(same.cont[same.cont[, 3] == i, , drop = FALSE]) > 0) {
              temp.cont <- same.cont[same.cont[, 3] == i, , drop = FALSE]
              if (all((ind - j + 1 + ind2[j]):(ind - j + ind2[j + 1]) %in%
                      temp.cont[, 4])) {
                temp.cont <- temp.cont[temp.cont[, 4] %in%
                  (ind - j + 1 + ind2[j]):(ind - j + ind2[j + 1]), ,
                                       drop = FALSE]
                if (length(unique(temp.cont[, 1])) == 1 &
                    length(unique(temp.cont[, 2])) ==
                    length(mix_pis[[i]][[j]])) {
                  ind <- ncol(Y_comp[[i]]) - K.comp[i] + K.error[i] + j
                  ind3 <- cumsum(sapply(mix_pis[[temp.cont[1, 1]]], length))
                  ind4 <- ncol(Y_cont[[temp.cont[1, 1]]]) -
                    K.mix2[temp.cont[1, 1]] +
                    min(which(ind3 >= max(temp.cont[1, 2])))
                  Y_cont[[i]][, ind] <- Y_cont[[temp.cont[1, 1]]][, ind4,
                                                                  drop = FALSE]
                  colnames(Y_cont[[i]])[ind] <-
                    colnames(Y_cont[[temp.cont[1, 1]]])[ind4]
                  next
                }
              }
            }
          }
          seed <- seed + 1
          set.seed(seed)
          R <- rmultinom(n, size = 1, prob = mix_pis[[i]][[j]])
          ind <- ncol(Y_comp[[i]]) - K.comp[i] + K.error[i] + j
          Y_cont[[i]][, ind] <- apply(t(R) *
            Y_comp[[i]][, (ind - j + 1 + ind2[j]):(ind - j + ind2[j + 1]),
                                                    drop = FALSE], 1, sum)
          Y_cont[[i]][, ind] <- means[[i]][ind] +
            sqrt(vars[[i]][ind]) * scale(Y_cont[[i]][, ind])
          colnames(Y_cont[[i]])[ind] <- paste("mix", i, "_", j, sep = "")
        }
      }
    }
  }
  betas <- calc_betas(corr.yx, corr.x, vars, mix_pis, mix_mus, mix_sigmas,
    error_type, n = 25, seed = seed)
  if (length(betas) == 1 | is.null(betas)) {
    stop("The beta coefficients could not be found.")
  }
  for (p in 1:M) {
    if (K.x[p] == 0) next
    colnames(Y_comp[[p]]) <- paste("X", p, "_", 1:K.x[p], sep = "")
    colnames(Y_cont[[p]]) <- paste("X", p, "_", 1:ncol(Y_cont[[p]]), sep = "")
  }
  Y <- matrix(1, nrow = n, ncol = M)
  colnames(Y) <- paste("Y", 1:M, sep = "")
  for (i in 1:M) {
    if (K.x[i] == 0) {
      Y[, i] <- matrix(betas.0[i], n, 1) + E2[, i]
    } else if ((length(betas[i, ]) - sum(betas[i, ] == 0)) == K.x[i]) {
      Y[, i] <- matrix(betas.0[i], n, 1) +
        Y_comp[[i]] %*% matrix(betas[i, 1:K.x[i]], ncol = 1) + E2[, i]
    } else {
      Y[, i] <- matrix(betas.0[i], n, 1) + Y_cont[[i]] %*%
        matrix(betas[i, 1:ncol(Y_cont[[i]])], ncol = 1) + E2[, i]
    }
  }
  stop.time <- Sys.time()
  Time.error <- round(difftime(stop.time.error, start.time.error,
                               units = "min"), 3)
  Time <- round(difftime(stop.time, start.time, units = "min"), 3)
  cat("Total Simulation time:", Time, "minutes \n")
  result <- list(Y = Y, X = Y_comp, X_all = Y_cont, E = E_comp, betas = betas,
    Sigma_X0 = Sigma_X0, Sigma_X = Sigma_X, constants = constants0,
    SixCorr = SixCorr, valid.pdf = Valid.PDF, niter = niter,
    Error_Time = Time.error, Time = Time)
  if (error_type == "mix") result <- append(result, list(E_mix = E2))
  result
}

Try the SimRepeat package in your browser

Any scripts or data that you put into this service are public.

SimRepeat documentation built on May 2, 2019, 9:32 a.m.