R/valid_corr2.R

Defines functions valid_corr2

Documented in valid_corr2

#' @title Determine Correlation Bounds for Ordinal, Continuous, Poisson, and/or Negative Binomial Variables: Correlation Method 2
#'
#' @description This function calculates the lower and upper correlation bounds for the given distributions and
#'     checks if a given target correlation matrix rho is within the bounds.  It should be used before simulation with
#'     \code{\link[SimMultiCorrData]{rcorrvar2}}.  However, even if all pairwise correlations fall within the bounds, it is still possible
#'     that the desired correlation matrix is not feasible.  This is particularly true when ordinal variables (r >= 2 categories) are
#'     generated or negative correlations are desired.  Therefore, this function should be used as a general check to eliminate pairwise correlations that are obviously
#'     not reproducible.  It will help prevent errors when executing the simulation.
#'
#'     Note: Some pieces of the function code have been adapted from Demirtas, Hu, & Allozi's (2017) \code{\link[PoisBinOrdNor]{validation_specs}}.
#'     This function (\code{\link[SimMultiCorrData]{valid_corr2}}) extends the methods to:
#'
#'    1) non-normal continuous variables generated by Fleishman's third-order or Headrick's fifth-order polynomial transformation method,
#'
#'    2) Negative Binomial variables (including all pairwise correlations involving them), and
#'
#'    3) Count variables are treated as ordinal when calculating the bounds since that is the intermediate correlation calculation method.
#'
#' Please see the \bold{Comparison of Method 1 and Method 2} vignette for more information regarding method 2.
#'
#' @section Reasons for Function Errors:
#'     1) The most likely cause for function errors is that no solutions to \code{\link[SimMultiCorrData]{fleish}} or
#'     \code{\link[SimMultiCorrData]{poly}} converged when using \code{\link[SimMultiCorrData]{find_constants}}.  If this happens,
#'     the simulation will stop.  It may help to first use \code{\link[SimMultiCorrData]{find_constants}} for each continuous variable to
#'     determine if a vector of sixth cumulant correction values is needed.  If the standardized cumulants are obtained from \code{calc_theory},
#'     the user may need to use rounded values as inputs (i.e.
#'     \code{skews = round(skews, 8)}).  Due to the nature of the integration involved in \code{calc_theory}, the results are
#'     approximations.  Greater accuracy can be achieved by increasing the number of subdivisions (\code{sub}) used in the integration
#'     process.  For example, in order to ensure that skew is exactly 0 for symmetric distributions.
#'
#'     2) In addition, the kurtosis may be outside the region of possible values.  There is an associated lower boundary for kurtosis associated
#'     with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method).  Use
#'     \code{\link[SimMultiCorrData]{calc_lower_skurt}} to determine the boundary for a given set of cumulants.
#'
#' @section The Generate, Sort, and Correlate (GSC, Demirtas & Hedeker, 2011, \doi{10.1198/tast.2011.10090}) Algorithm:
#' The GSC algorithm is a flexible method for determining empirical correlation bounds when the theoretical bounds are unknown.
#' The steps are as follows:
#'
#' 1) Generate independent random samples from the desired distributions using a large number of observations (i.e. N = 100,000).
#'
#' 2) Lower Bound: Sort the two variables in opposite directions (i.e., one increasing and one decreasing) and find the sample correlation.
#'
#' 3) Upper Bound: Sort the two variables in the same direction and find the sample correlation.
#'
#' Demirtas & Hedeker showed that the empirical bounds computed from the GSC method are similar to the theoretical bounds (when they are known).
#'
#' The processes used to find the correlation bounds for each variable type are described below:
#'
#' @section Ordinal Variables:
#' Binary pairs: The correlation bounds are determined as in Demirtas et al. (2012, \doi{10.1002/sim.5362}), who used the method of Emrich &
#' Piedmonte (1991, \doi{10.1080/00031305.1991.10475828}).  The joint distribution is determined by "borrowing" the moments of a multivariate normal
#' distribution.  For two binary variables \eqn{Y_{i}} and \eqn{Y_{j}}, with success probabilities \eqn{p_{i}} and \eqn{p_{j}}, the lower
#' correlation bound is given by
#' \deqn{max(-\sqrt{(p_{i}p_{j})/(q_{i}q_{j})},\ -\sqrt{(q_{i}q_{j})/(p_{i}p_{j})})}
#' and the upper bound by
#' \deqn{min(\sqrt{(p_{i}q_{j})/(q_{i}p_{j})},\ \sqrt{(q_{i}p_{j})/(p_{i}q_{j})})}
#' Here, \eqn{q_{i} = 1 - p_{i}} and \eqn{q_{j} = 1 - p_{j}}.
#'
#' Binary-Ordinal or Ordinal-Ordinal pairs: Randomly generated variables with the given marginal distributions are used in the
#' GSC algorithm to find the correlation bounds.
#'
#' @section Continuous Variables:
#' Continuous variables are randomly generated using constants from \code{\link[SimMultiCorrData]{find_constants}} and a vector of sixth
#' cumulant correction values (if provided.)  The GSC algorithm is used to find the lower and upper bounds.
#'
#' @section Poisson Variables:
#' The maximum support values, given the vector of cumulative probability truncation values (pois_eps) and vector of means (lam), are calculated using
#' \code{\link[SimMultiCorrData]{max_count_support}}.  The finite supports are used to determine marginal distributions for each Poisson variable.
#' Randomly generated variables with the given marginal distributions are used in the GSC algorithm to find the correlation bounds.
#'
#' @section Negative Binomial Variables:
#' The maximum support values, given the vector of cumulative probability truncation values (nb_eps) and vectors of
#' sizes and success probabilities (prob) or means (mu), are calculated using \code{\link[SimMultiCorrData]{max_count_support}}.
#' The finite supports are used to determine marginal distributions for each Negative Binomial variable.
#' Randomly generated variables with the given marginal distributions are used in the GSC algorithm to find the correlation bounds.
#'
#' @section Continuous - Ordinal Pairs:
#' Randomly generated ordinal variables with the given marginal distributions and the previously generated continuous variables are used in the
#' GSC algorithm to find the correlation bounds.
#'
#' @section Ordinal - Poisson Pairs:
#' Randomly generated ordinal and Poisson variables with the given marginal distributions are used in the GSC algorithm to find
#' the correlation bounds.
#'
#' @section Ordinal - Negative Binomial Pairs:
#' Randomly generated ordinal and Negative Binomial variables with the given marginal distributions are used in the GSC algorithm to find
#' the correlation bounds.
#'
#' @section Continuous - Poisson Pairs:
#' The previously generated continuous variables and randomly generated Poisson variables with the given marginal distributions are used
#' in the GSC algorithm to find the correlation bounds.
#'
#' @section Continuous - Negative Binomial Pairs:
#' The previously generated continuous variables and randomly generated Negative Binomial variables with the given marginal distributions are used
#' in the GSC algorithm to find the correlation bounds.
#'
#' @section Poisson - Negative Binomial Pairs:
#' Randomly generated variables with the given marginal distributions are used in the GSC algorithm to find the correlation bounds.
#'
#' @param k_cat the number of ordinal (r >= 2 categories) variables (default = 0)
#' @param k_cont the number of continuous variables (default = 0)
#' @param k_pois the number of Poisson variables (default = 0)
#' @param k_nb the number of Negative Binomial variables (default = 0)
#' @param method the method used to generate the k_cont continuous variables.  "Fleishman" uses a third-order polynomial transformation
#'     and "Polynomial" uses Headrick's fifth-order transformation.
#' @param means a vector of means for the k_cont continuous variables (i.e. = rep(0, k_cont))
#' @param vars a vector of variances (i.e. = rep(1, k_cont))
#' @param skews a vector of skewness values (i.e. = rep(0, k_cont))
#' @param skurts a vector of standardized kurtoses (kurtosis - 3, so that normal variables have a value of 0; i.e. = rep(0, k_cont))
#' @param fifths a vector of standardized fifth cumulants (not necessary for \code{method} = "Fleishman"; i.e. = rep(0, k_cont))
#' @param sixths a vector of standardized sixth cumulants (not necessary for \code{method} = "Fleishman"; i.e. = rep(0, k_cont))
#' @param Six a list of vectors of correction values to add to the sixth cumulants if no valid pdf constants are found,
#'     ex: \code{Six = list(seq(0.01, 2,by = 0.01), seq(1, 10,by = 0.5))}; if no correction is desired for variable Y_i, set the i-th list
#'     component equal to NULL
#' @param marginal a list of length equal to \code{k_cat}; the i-th element is a vector of the cumulative
#'     probabilities defining the marginal distribution of the i-th variable;
#'     if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1; default = list())
#' @param lam a vector of lambda (> 0) constants for the Poisson variables (see \code{\link[stats]{Poisson}})
#' @param pois_eps a vector of length \code{k_pois} containing the truncation values (i.e. = rep(0.0001, k_pois); default = NULL)
#' @param size a vector of size parameters for the Negative Binomial variables (see \code{\link[stats]{NegBinomial}})
#' @param prob a vector of success probability parameters
#' @param mu a vector of mean parameters (*Note: either \code{prob} or \code{mu} should be supplied for all Negative Binomial variables,
#'     not a mixture; default = NULL)
#' @param nb_eps a vector of length \code{k_nb} containing the truncation values (i.e. = rep(0.0001, k_nb); default = NULL)
#' @param rho the target correlation matrix (\emph{must be ordered ordinal, continuous, Poisson, Negative Binomial}; default = NULL)
#' @param n the sample size (i.e. the length of each simulated variable; default = 100000)
#' @param seed the seed value for random number generation (default = 1234)
#' @import stats
#' @import utils
#' @import nleqslv
#' @import BB
#' @export
#' @keywords correlation, bounds, continuous, ordinal, Poisson, Negative Binomial, Fleishman, Headrick, method2
#' @seealso \code{\link[SimMultiCorrData]{find_constants}}, \code{\link[SimMultiCorrData]{rcorrvar2}}
#' @return A list with components:
#' @return \code{L_rho} the lower correlation bound
#' @return \code{U_rho} the upper correlation bound
#' @return If continuous variables are desired, additional components are:
#' @return \code{constants} the calculated constants
#' @return \code{sixth_correction} a vector of the sixth cumulant correction values
#' @return \code{valid.pdf} a vector with i-th component equal to "TRUE" if variable Y_i has a valid power method pdf, else "FALSE"
#' @return If a target correlation matrix rho is provided, each pairwise correlation is checked to see if it is within the lower and upper
#' bounds.  If the correlation is outside the bounds, the indices of the variable pair are given.
#' @references Please see \code{\link[SimMultiCorrData]{rcorrvar2}} for additional references.
#'
#' Demirtas H & Hedeker D (2011). A practical way for computing approximate lower and upper correlation bounds.
#'     American Statistician, 65(2): 104-109. \doi{10.1198/tast.2011.10090}.
#'
#' Demirtas H, Hedeker D, & Mermelstein RJ (2012). Simulation of massive public health data by power polynomials.
#'     Statistics in Medicine, 31(27): 3337-3346. \doi{10.1002/sim.5362}.
#'
#' Emrich LJ & Piedmonte MR (1991). A Method for Generating High-Dimensional Multivariate Binary Variables. The American Statistician, 45(4): 302-4.
#'     \doi{10.1080/00031305.1991.10475828}.
#'
#' Frechet M.  Sur les tableaux de correlation dont les marges sont donnees.  Ann. l'Univ. Lyon SectA.  1951;14:53-77.
#'
#' Hoeffding W. Scale-invariant correlation theory. In: Fisher NI, Sen PK, editors. The collected works of Wassily Hoeffding.
#'     New York: Springer-Verlag; 1994. p. 57-107.
#'
#' Hakan Demirtas, Yiran Hu and Rawan Allozi (2017). PoisBinOrdNor: Data Generation with Poisson, Binary, Ordinal and Normal Components.
#'     R package version 1.4. \url{https://CRAN.R-project.org/package=PoisBinOrdNor}
#'
#' @examples
#' valid_corr2(n = 1000, k_cat = 1, k_cont = 1, method = "Polynomial",
#'   means = 0, vars = 1, skews = 0, skurts = 0, fifths = 0, sixths = 0,
#'   marginal = list(c(1/3, 2/3)), rho = matrix(c(1, 0.4, 0.4, 1), 2, 2))
#'
#' \dontrun{
#'
#' # Binary, Ordinal, Continuous, Poisson, and Negative Binomial Variables
#'
#' options(scipen = 999)
#' seed <- 1234
#' n <- 10000
#'
#' # Continuous Distributions: Normal, t (df = 10), Chisq (df = 4),
#' #                           Beta (a = 4, b = 2), Gamma (a = 4, b = 4)
#' Dist <- c("Gaussian", "t", "Chisq", "Beta", "Gamma")
#'
#' # calculate standardized cumulants
#' # those for the normal and t distributions are rounded to ensure the
#' # correct values (i.e. skew = 0)
#'
#' M1 <- round(calc_theory(Dist = "Gaussian", params = c(0, 1)), 8)
#' M2 <- round(calc_theory(Dist = "t", params = 10), 8)
#' M3 <- calc_theory(Dist = "Chisq", params = 4)
#' M4 <- calc_theory(Dist = "Beta", params = c(4, 2))
#' M5 <- calc_theory(Dist = "Gamma", params = c(4, 4))
#' M <- cbind(M1, M2, M3, M4, M5)
#' M <- round(M[-c(1:2),], digits = 6)
#' colnames(M) <- Dist
#' rownames(M) <- c("skew", "skurtosis", "fifth", "sixth")
#' means <- rep(0, length(Dist))
#' vars <- rep(1, length(Dist))
#'
#' # Binary and Ordinal Distributions
#' marginal <- list(0.3, 0.4, c(0.1, 0.5), c(0.3, 0.6, 0.9),
#'                  c(0.2, 0.4, 0.7, 0.8))
#' support <- list()
#'
#' # Poisson Distributions
#' lam <- c(1, 5, 10)
#'
#' # Negative Binomial Distributions
#' size <- c(3, 6)
#' prob <- c(0.2, 0.8)
#'
#' ncat <- length(marginal)
#' ncont <- ncol(M)
#' npois <- length(lam)
#' nnb <- length(size)
#'
#' # Create correlation matrix from a uniform distribution (-0.8, 0.8)
#' set.seed(seed)
#' Rey <- diag(1, nrow = (ncat + ncont + npois + nnb))
#' for (i in 1:nrow(Rey)) {
#'   for (j in 1:ncol(Rey)) {
#'     if (i > j) Rey[i, j] <- runif(1, -0.8, 0.8)
#'     Rey[j, i] <- Rey[i, j]
#'   }
#' }
#'
#' # Test for positive-definiteness
#' library(Matrix)
#' if(min(eigen(Rey, symmetric = TRUE)$values) < 0) {
#'   Rey <- as.matrix(nearPD(Rey, corr = T, keepDiag = T)$mat)
#' }
#'
#' # Make sure Rey is within upper and lower correlation limits
#' valid <- valid_corr2(k_cat = ncat, k_cont = ncont, k_pois = npois,
#'                      k_nb = nnb, method = "Polynomial", means = means,
#'                      vars = vars, skews = M[1, ], skurts = M[2, ],
#'                      fifths = M[3, ], sixths = M[4, ], marginal = marginal,
#'                      lam = lam, pois_eps = rep(0.0001, npois),
#'                      size = size, prob = prob, nb_eps = rep(0.0001, nnb),
#'                      rho = Rey, seed = seed)
#'
#' }
valid_corr2 <- function(k_cat = 0, k_cont = 0, k_pois = 0, k_nb = 0,
                       method = c("Fleishman", "Polynomial"),
                       means =  NULL, vars =  NULL, skews =  NULL,
                       skurts =  NULL, fifths =  NULL, sixths =  NULL,
                       Six = list(), marginal = list(),
                       lam = NULL, pois_eps = NULL,
                       size = NULL, prob = NULL, mu = NULL,
                       nb_eps = NULL, rho = NULL,
                       n = 100000, seed = 1234) {
  k <- k_cat + k_cont + k_pois + k_nb
  if (!is.null(rho)) {
    if (ncol(rho) != k)
      stop("Dimension of correlation matrix does not
           match the number of variables!\n")
    if (!isSymmetric(rho) | min(eigen(rho, symmetric = TRUE)$values) < 0 |
        !all(diag(rho) == 1))
      stop("Correlation matrix not valid!")
  }
  L_sigma <- diag(k)
  U_sigma <- diag(k)
  csame.dist <- NULL
  if (k_cont > 1) {
    for (i in 2:length(skews)) {
      if (skews[i] %in% skews[1:(i - 1)]) {
        csame <- which(skews[1:(i - 1)] == skews[i])
        for (j in 1:length(csame)) {
          if (method == "Polynomial") {
            if ((skurts[i] == skurts[csame[j]]) &
                (fifths[i] == fifths[csame[j]]) &
                (sixths[i] == sixths[csame[j]])) {
              csame.dist <- rbind(csame.dist, c(csame[j], i))
              break
            }
          }
          if (method == "Fleishman") {
            if (skurts[i] == skurts[csame[j]]) {
              csame.dist <- rbind(csame.dist, c(csame[j], i))
              break
            }
          }
        }
      }
    }
  }
  if (k_cont >= 1) {
    SixCorr <- numeric(k_cont)
    Valid.PDF <- numeric(k_cont)
    set.seed(seed)
    X_cont <- matrix(rnorm(k_cont * n), n, k_cont)
    if (method == "Fleishman") {
      constants <- matrix(NA, nrow = k_cont, ncol = 4)
      colnames(constants) <- c("c0", "c1", "c2", "c3")
    }
    if (method == "Polynomial") {
      constants <- matrix(NA, nrow = k_cont, ncol = 6)
      colnames(constants) <- c("c0", "c1", "c2", "c3", "c4", "c5")
    }
    for (i in 1:k_cont) {
      if (!is.null(csame.dist)) {
        rind <- which(csame.dist[, 2] == i)
        if (length(rind) > 0) {
          constants[i, ] <- constants[csame.dist[rind, 1], ]
          SixCorr[i] <- SixCorr[csame.dist[rind, 1]]
          Valid.PDF[i] <- Valid.PDF[csame.dist[rind, 1]]
        }
      }
      if (sum(is.na(constants[i, ])) > 0) {
        if (length(Six) == 0) Six2 <- NULL else
          Six2 <- Six[[i]]
        cons <-
          suppressWarnings(find_constants(method = method, skews = skews[i],
                                          skurts = skurts[i],
                                          fifths = fifths[i],
                                          sixths = sixths[i], Six = Six2,
                                          n = 25, seed = seed))
        if (length(cons) == 1 | is.null(cons)) {
          stop(paste("Constants can not be found for continuous variable ", i,
                     ".", sep = ""))
        }
        con_solution <- cons$constants
        SixCorr[i] <- ifelse(is.null(cons$SixCorr1), NA, cons$SixCorr1)
        Valid.PDF[i] <- cons$valid
        constants[i, ] <- con_solution
      }
      cat("\n", "Constants: Distribution ", i, " \n")
    }
    Y <- Yb <- matrix(1, nrow = n, ncol = k_cont)
    for (i in 1:k_cont) {
      if (method == "Fleishman") {
        Y[, i] <- constants[i, 1] + constants[i, 2] * X_cont[, i] +
          constants[i, 3] * X_cont[, i]^2 + constants[i, 4] * X_cont[, i]^3
      }
      if (method == "Polynomial") {
        Y[, i] <- constants[i, 1] + constants[i, 2] * X_cont[, i] +
          constants[i, 3] * X_cont[, i]^2 + constants[i, 4] * X_cont[, i]^3 +
          constants[i, 5] * X_cont[, i]^4 + constants[i, 6] * X_cont[, i]^5
      }
      Yb[, i] <- means[i] + sqrt(vars[i]) * Y[, i]
    }
  }
  if (k_pois > 0 | k_nb > 0) {
    max_support <- max_count_support(k_pois = k_pois, k_nb = k_nb, lam = lam,
                                     pois_eps = pois_eps, size = size,
                                     prob = prob, mu = mu, nb_eps = nb_eps)
  }
  if (k_pois > 0) {
    pois_max <- max_support[max_support$Distribution == "Poisson", ]
    pois_support <- list()
    pois_prob <- list()
    pois_marg <- list()
    for (i in 1:k_pois) {
      pois_support[[i]] <- 0:pois_max[i, 3]
      pois_prob[[i]] <- dpois(0:pois_max[i, 3], lam[i])
      pois_prob[[i]][pois_max[i, 3] + 1] <-
        1 - sum(pois_prob[[i]][1:pois_max[i, 3]])
      pois_marg[[i]] <- cumsum(pois_prob[[i]])
      pois_marg[[i]] <- pois_marg[[i]][-(pois_max[i, 3] + 1)]
    }
  }
  if (k_nb > 0) {
    nb_max <- max_support[max_support$Distribution == "Neg_Bin", ]
    nb_support <- list()
    nb_prob <- list()
    nb_marg <- list()
    for (i in 1:k_nb) {
      nb_support[[i]] <- 0:nb_max[i, 3]
      if (length(prob) > 0) {
        nb_prob[[i]] <- dnbinom(0:nb_max[i, 3], size[i], prob[i])
      }
      if (length(mu) > 0) {
        nb_prob[[i]] <- dnbinom(0:nb_max[i, 3], size[i], mu = mu[i])
      }
      nb_prob[[i]][nb_max[i, 3] + 1] <-
        1 - sum(nb_prob[[i]][1:nb_max[i, 3]])
      nb_marg[[i]] <- cumsum(nb_prob[[i]])
      nb_marg[[i]] <- nb_marg[[i]][-(nb_max[i, 3] + 1)]
    }
  }
  for (q in 1:(k - 1)) {
    for (r in (q + 1):k) {
      if (q >= 1 & q <= k_cat & r >= 1 & r <= k_cat) {
        marg1 <- marginal[[q]]
        marg2 <- marginal[[r]]
        if (length(marg1) == 1 & length(marg2) == 1) {
          p1 <- marg1
          q1 <- 1 - marg1
          p2 <- marg2
          q2 <- 1 - marg2
          L_sigma[q, r] <- L_sigma[r, q] <- max(-sqrt((p1 * p2)/(q1 * q2)),
                                                -sqrt((q1 * q2)/(p1 * p2)))
          U_sigma[q, r] <- U_sigma[r, q] <- min(sqrt((p1 * q2)/(q1 * p2)),
                                                sqrt((q1 * p2)/(p1 * q2)))
        } else {
          set.seed(seed)
          rnorms <- matrix(rnorm(2 * n, 0, 1), ncol = 2)
          n1 <- rnorms[, 1]
          n2 <- rnorms[, 2]
          nord1 <- numeric(length(n1))
          nord2 <- numeric(length(n2))
          for (i in 1:length(marg1)) {
            if (i != length(marg1)) {
              q1 <- qnorm(marg1[i])
              q2 <- qnorm(marg1[i + 1])
              nord1[(q1 < n1) & (n1 <= q2)] <- i
            } else {
              nord1[n1 > qnorm(marg1[i])] <- i
            }
          }
          for (i in 1:length(marg2)) {
            if (i != length(marg2)) {
              q1 <- qnorm(marg2[i])
              q2 <- qnorm(marg2[i + 1])
              nord2[(q1 < n2) & (n2 <= q2)] <- i
            } else {
              nord2[n2 > qnorm(marg2[i])] <- i
            }
          }
          nord1 <- nord1 + 1
          nord2 <- nord2 + 1
          L_sigma[q, r] <- L_sigma[r, q] <-
            cor(nord1[order(nord1, decreasing = TRUE)], nord2[order(nord2)])
          U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                                nord2[order(nord2)])
        }
      }
      if (q >= 1 & q <= k_cat & r >= (k_cat + 1) & r <= (k_cat + k_cont)) {
        set.seed(seed)
        n1 <- rnorm(n, 0, 1)
        n2 <- Y[, r - k_cat]
        marg1 <- marginal[[q]]
        nord1 <- numeric(length(n1))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n1) & (n1 <= q2)] <- i
          } else {
            nord1[n1 > qnorm(marg1[i])] <- i
          }
        }
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(nord1[order(nord1, decreasing = TRUE)], n2[order(n2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                              n2[order(n2)])
      }
      if (q >= 1 & q <= k_cat & r >= (k_cat + k_cont + 1) &
          r <= (k_cat + k_cont + k_pois)) {
        marg1 <- marginal[[q]]
        marg2 <- pois_marg[[r - (k_cat + k_cont)]]
        set.seed(seed)
        rnorms <- matrix(rnorm(2 * n, 0, 1), ncol = 2)
        n1 <- rnorms[, 1]
        n2 <- rnorms[, 2]
        nord1 <- numeric(length(n1))
        nord2 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n1) & (n1 <= q2)] <- i
          } else {
            nord1[n1 > qnorm(marg1[i])] <- i
          }
        }
        for (i in 1:length(marg2)) {
          if (i != length(marg2)) {
            q1 <- qnorm(marg2[i])
            q2 <- qnorm(marg2[i + 1])
            nord2[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord2[n2 > qnorm(marg2[i])] <- i
          }
        }
        nord1 <- nord1 + 1
        nord2 <- nord2 + 1
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(nord1[order(nord1, decreasing = TRUE)], nord2[order(nord2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                              nord2[order(nord2)])
      }
      if (q >= 1 & q <= k_cat & r >= (k_cat + k_cont + k_pois + 1) &
          r <= (k_cat + k_cont + k_pois + k_nb)) {
        marg1 <- marginal[[q]]
        marg2 <- nb_marg[[r - (k_cat + k_cont + k_pois)]]
        set.seed(seed)
        rnorms <- matrix(rnorm(2 * n, 0, 1), ncol = 2)
        n1 <- rnorms[, 1]
        n2 <- rnorms[, 2]
        nord1 <- numeric(length(n1))
        nord2 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n1) & (n1 <= q2)] <- i
          } else {
            nord1[n1 > qnorm(marg1[i])] <- i
          }
        }
        for (i in 1:length(marg2)) {
          if (i != length(marg2)) {
            q1 <- qnorm(marg2[i])
            q2 <- qnorm(marg2[i + 1])
            nord2[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord2[n2 > qnorm(marg2[i])] <- i
          }
        }
        nord1 <- nord1 + 1
        nord2 <- nord2 + 1
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(nord1[order(nord1, decreasing = TRUE)], nord2[order(nord2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                              nord2[order(nord2)])
      }
      if (q >= (k_cat + 1) & q <= (k_cat + k_cont) & r >= (k_cat + 1) &
          r <= (k_cat + k_cont)) {
        n1 <- Y[, q - k_cat]
        n2 <- Y[, r - k_cat]
        L_sigma[q, r] <- L_sigma[r, q] <- cor(n1[order(n1, decreasing = TRUE)],
                                              n2[order(n2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(n1[order(n1)], n2[order(n2)])
      }
      if (q >= (k_cat + 1) & q <= (k_cat + k_cont) &
          r >= (k_cat + k_cont + 1) & r <= (k_cat + k_cont + k_pois)) {
        set.seed(seed)
        n1 <- Y[, q - k_cat]
        n2 <- rnorm(n, 0, 1)
        marg1 <- pois_marg[[r - (k_cat + k_cont)]]
        nord1 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord1[n2 > qnorm(marg1[i])] <- i
          }
        }
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(n1[order(n1, decreasing = TRUE)], nord1[order(nord1)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(n1[order(n1)],
                                              nord1[order(nord1)])
      }
      if (q >= (k_cat + 1) & q <= (k_cat + k_cont) &
         r >= (k_cat + k_cont + k_pois + 1) &
         r <= (k_cat + k_cont + k_pois + k_nb)) {
        set.seed(seed)
        n1 <- Y[, q - k_cat]
        n2 <- rnorm(n, 0, 1)
        marg1 <- nb_marg[[r - (k_cat + k_cont + k_pois)]]
        nord1 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord1[n2 > qnorm(marg1[i])] <- i
          }
        }
        L_sigma[q, r] <- L_sigma[r, q] <- cor(n1[order(n1, decreasing = TRUE)],
                                              nord1[order(nord1)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(n1[order(n1)],
                                              nord1[order(nord1)])
      }
      if (q >= (k_cat + k_cont + 1) & q <= (k_cat + k_cont + k_pois) &
          r >= (k_cat + k_cont + 1) & r <= (k_cat + k_cont + k_pois)) {
        marg1 <- pois_marg[[q - (k_cat + k_cont)]]
        marg2 <- pois_marg[[r - (k_cat + k_cont)]]
        set.seed(seed)
        rnorms <- matrix(rnorm(2 * n, 0, 1), ncol = 2)
        n1 <- rnorms[, 1]
        n2 <- rnorms[, 2]
        nord1 <- numeric(length(n1))
        nord2 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n1) & (n1 <= q2)] <- i
          } else {
            nord1[n1 > qnorm(marg1[i])] <- i
          }
        }
        for (i in 1:length(marg2)) {
          if (i != length(marg2)) {
            q1 <- qnorm(marg2[i])
            q2 <- qnorm(marg2[i + 1])
            nord2[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord2[n2 > qnorm(marg2[i])] <- i
          }
        }
        nord1 <- nord1 + 1
        nord2 <- nord2 + 1
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(nord1[order(nord1, decreasing = TRUE)], nord2[order(nord2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                              nord2[order(nord2)])
      }
      if (q >= (k_cat + k_cont + 1) & q <= (k_cat + k_cont + k_pois) &
          r >= (k_cat + k_cont + k_pois + 1) &
          r <= (k_cat + k_cont + k_pois + k_nb)) {
        marg1 <- pois_marg[[q - (k_cat + k_cont)]]
        marg2 <- nb_marg[[r - (k_cat + k_cont + k_pois)]]
        set.seed(seed)
        rnorms <- matrix(rnorm(2 * n, 0, 1), ncol = 2)
        n1 <- rnorms[, 1]
        n2 <- rnorms[, 2]
        nord1 <- numeric(length(n1))
        nord2 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n1) & (n1 <= q2)] <- i
          } else {
            nord1[n1 > qnorm(marg1[i])] <- i
          }
        }
        for (i in 1:length(marg2)) {
          if (i != length(marg2)) {
            q1 <- qnorm(marg2[i])
            q2 <- qnorm(marg2[i + 1])
            nord2[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord2[n2 > qnorm(marg2[i])] <- i
          }
        }
        nord1 <- nord1 + 1
        nord2 <- nord2 + 1
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(nord1[order(nord1, decreasing = TRUE)], nord2[order(nord2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                              nord2[order(nord2)])
      }
      if (q >= (k_cat + k_cont + k_pois + 1) &
          q <= (k_cat + k_cont + k_pois + k_nb) &
          r >= (k_cat + k_cont + k_pois + 1) &
          r <= (k_cat + k_cont + k_pois + k_nb)) {
        marg1 <- nb_marg[[q - (k_cat + k_cont + k_pois)]]
        marg2 <- nb_marg[[r - (k_cat + k_cont + k_pois)]]
        set.seed(seed)
        rnorms <- matrix(rnorm(2 * n, 0, 1), ncol = 2)
        n1 <- rnorms[, 1]
        n2 <- rnorms[, 2]
        nord1 <- numeric(length(n1))
        nord2 <- numeric(length(n2))
        for (i in 1:length(marg1)) {
          if (i != length(marg1)) {
            q1 <- qnorm(marg1[i])
            q2 <- qnorm(marg1[i + 1])
            nord1[(q1 < n1) & (n1 <= q2)] <- i
          } else {
            nord1[n1 > qnorm(marg1[i])] <- i
          }
        }
        for (i in 1:length(marg2)) {
          if (i != length(marg2)) {
            q1 <- qnorm(marg2[i])
            q2 <- qnorm(marg2[i + 1])
            nord2[(q1 < n2) & (n2 <= q2)] <- i
          } else {
            nord2[n2 > qnorm(marg2[i])] <- i
          }
        }
        nord1 <- nord1 + 1
        nord2 <- nord2 + 1
        L_sigma[q, r] <- L_sigma[r, q] <-
          cor(nord1[order(nord1, decreasing = TRUE)], nord2[order(nord2)])
        U_sigma[q, r] <- U_sigma[r, q] <- cor(nord1[order(nord1)],
                                              nord2[order(nord2)])
      }
    }
  }
  if (!is.null(rho)) {
    valid.state <- TRUE
    for (i in 1:(k - 1)) {
      for (j in (i + 1):k) {
        if (rho[i, j] < L_sigma[i, j] | rho[i, j] > U_sigma[i, j]) {
          cat("Range error! Corr[", i, ",", j, "] must be between",
              round(L_sigma[i, j], 6), "and", round(U_sigma[i, j], 6), "\n")
          valid.state <- FALSE
        }
      }
    }
    if (valid.state == TRUE) cat("All correlations are in feasible range!")
    if (valid.state == FALSE) cat("Some correlations are not in feasible
                                  range!")
  }
  if (k_cont > 0) {
    return(list(L_rho = L_sigma, U_rho = U_sigma, constants = constants,
                sixth_correction = SixCorr, valid.pdf = Valid.PDF))
  } else {
    return(list(L_rho = L_sigma, U_rho = U_sigma))
  }
}

Try the SimMultiCorrData package in your browser

Any scripts or data that you put into this service are public.

SimMultiCorrData documentation built on May 2, 2019, 9:50 a.m.