SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

#' @title Calculate Intermediate MVN Correlation for Continuous - Negative Binomial Variables: Correlation Method 2
#'
#' @description This function calculates a \code{k_cont x k_nb} intermediate matrix of correlations for the \code{k_cont} continuous and
#'     \code{k_nb} Negative Binomial variables. It extends the methods of Demirtas et al. (2012, \doi{10.1002/sim.5362}) and
#'     Barbiero & Ferrari (2015, \doi{10.1002/asmb.2072}) by:
#'
#'     1) including non-normal continuous and regular or zero-inflated Negative Binomial variables
#'
#'     2) allowing the continuous variables to be generated via Fleishman's third-order or Headrick's fifth-order transformation, and
#'
#'     3) since the count variables are treated as ordinal, using the point-polyserial and polyserial correlations to calculate the
#'     intermediate correlations (similar to \code{\link[SimMultiCorrData]{findintercorr_cont_cat}} in \cr
#'     \code{\link[SimMultiCorrData]{SimMultiCorrData}}).
#'
#'     Here, the intermediate correlation between Z1 and Z2 (where Z1 is the standard normal variable transformed using Headrick's fifth-order
#'     or Fleishman's third-order method to produce a continuous variable Y1, and Z2 is the standard normal variable used to generate a
#'     Negative Binomial variable via the inverse CDF method) is calculated by dividing the target correlation by a correction factor.  The
#'     correction factor is the product of the point-polyserial correlation between Y2 and Z2 (described in Olsson et al., 1982,
#'     \doi{10.1007/BF02294164}) and the power method correlation (described in Headrick & Kowalchuk, 2007, \doi{10.1080/10629360600605065})
#'     between Y1 and Z1.  After the maximum support value has been found using \code{\link[SimCorrMix]{maxcount_support}}, the point-polyserial correlation is given by:
#'     \deqn{\rho_{Y2,Z2} = \frac{1}{\sigma_{Y2}} \sum_{j = 1}^{r-1} \phi(\tau_{j})(y2_{j+1} - y2_{j})} where
#'     \deqn{\phi(\tau) = (2\pi)^{-1/2} * exp(-0.5 \tau^2)}  Here, \eqn{y_{j}} is the j-th support
#'     value and \eqn{\tau_{j}} is \eqn{\Phi^{-1}(\sum_{i=1}^{j} Pr(Y = y_{i}))}.  The power method correlation is given by:
#'     \deqn{\rho_{Y1, Z1} = c_1 + 3c_3 + 15c_5,} where \eqn{c_5 = 0} if \code{method} = "Fleishman".  The function is used in
#'     \code{\link[SimCorrMix]{intercorr2}} and \code{\link[SimCorrMix]{corrvar2}}.  This function would not ordinarily be called by the user.
#'
#' @param method the method used to generate the \code{k_cont} continuous variables.  "Fleishman" uses Fleishman's third-order polynomial transformation
#'     and "Polynomial" uses Headrick's fifth-order transformation.
#' @param constants a matrix with \code{k_cont} rows, each a vector of constants c0, c1, c2, c3 (if \code{method} = "Fleishman") or
#'     c0, c1, c2, c3, c4, c5 (if \code{method} = "Polynomial"), like that returned by \code{\link[SimMultiCorrData]{find_constants}}
#' @param rho_cont_nb a \code{k_cont x k_nb} matrix of target correlations among continuous and Negative Binomial variables; the NB variables
#'     should be ordered 1st regular, 2nd zero-inflated
#' @param nb_marg a list of length equal to \code{k_nb} ordered 1st regular and 2nd zero-inflated; the i-th element is a vector of the cumulative
#'     probabilities defining the marginal distribution of the i-th variable;
#'     if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1);
#'     this is created within \code{\link[SimCorrMix]{intercorr2}} and \code{\link[SimCorrMix]{corrvar2}}
#' @param nb_support a list of length equal to \code{k_nb} ordered 1st regular and 2nd zero-inflated; the i-th element is a vector of containing the r
#'     ordered support values, with a minimum of 0 and maximum determined via \code{\link[SimCorrMix]{maxcount_support}}
#' @export
#' @keywords correlation continuous NegativeBinomial method2
#' @seealso \code{\link[SimMultiCorrData]{find_constants}}, \code{\link[SimMultiCorrData]{power_norm_corr}},
#'     \code{\link[SimCorrMix]{intercorr2}}, \code{\link[SimCorrMix]{corrvar2}}
#' @return a \code{k_cont x k_nb} matrix whose rows represent the \code{k_cont} continuous variables and columns represent the
#'     \code{k_nb} Negative Binomial variables
#' @references
#' Please see references in \code{\link[SimCorrMix]{intercorr_cont_pois2}}.
#'
intercorr_cont_nb2 <- function(method = c("Fleishman", "Polynomial"),
                               constants = NULL, rho_cont_nb = NULL,
                               nb_marg = list(), nb_support = list()) {
  Sigma_cont_nb <- matrix(1, nrow = nrow(rho_cont_nb),
                          ncol = ncol(rho_cont_nb))
  for (i in 1:nrow(rho_cont_nb)) {
    for (j in 1:ncol(rho_cont_nb)) {
      Sigma_cont_nb[i, j] <-
        (rho_cont_nb[i, j] * sqrt(var_cat(marginal = nb_marg[[j]],
         support = nb_support[[j]])))/(denom_corr_cat(marginal = nb_marg[[j]],
         support = nb_support[[j]]) * power_norm_corr(constants[i, ], method))
      if (Sigma_cont_nb[i, j] > 1) Sigma_cont_nb[i, j] <- 1
      if (Sigma_cont_nb[i, j] < -1) Sigma_cont_nb[i, j] <- -1
    }
  }
  return(Sigma_cont_nb)
}

AFialkowski/SimCorrMix documentation built on May 30, 2019, 3:47 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AFialkowski/SimCorrMix
Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

R/intercorr_cont_nb2.R
In AFialkowski/SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

R Package Documentation

Browse R Packages

We want your feedback!

AFialkowski/SimCorrMix Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

R/intercorr_cont_nb2.R In AFialkowski/SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

R Package Documentation

Browse R Packages

We want your feedback!

AFialkowski/SimCorrMix
Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

R/intercorr_cont_nb2.R
In AFialkowski/SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions