findintercorr2: Calculate Intermediate MVN Correlation for Ordinal,...
In SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types

Description Usage Arguments Value Overview of Correlation Method 2 Ordinal Variables Continuous Variables Poisson Variables Negative Binomial Variables Continuous - Ordinal Pairs Ordinal - Poisson Pairs Ordinal - Negative Binomial Pairs Continuous - Poisson Pairs Continuous - Negative Binomial Pairs Poisson - Negative Binomial Pairs References See Also Examples

View source: R/findintercorr2.R

This function calculates a k x k intermediate matrix of correlations, where k = k_cat + k_cont + k_pois + k_nb, to be used in simulating variables with rcorrvar2. The ordering of the variables must be ordinal, continuous, Poisson, and Negative Binomial (note that it is possible for k_cat, k_cont, k_pois, and/or k_nb to be 0). The function first checks that the target correlation matrix rho is positive-definite and the marginal distributions for the ordinal variables are cumulative probabilities with r - 1 values (for r categories). There is a warning given at the end of simulation if the calculated intermediate correlation matrix Sigma is not positive-definite. This function is called by the simulation function rcorrvar2, and would only be used separately if the user wants to find the intermediate correlation matrix only. The simulation functions also return the intermediate correlation matrix.

findintercorr2(n, k_cont = 0, k_cat = 0, k_pois = 0, k_nb = 0,
  method = c("Fleishman", "Polynomial"), constants, marginal = list(),
  support = list(), lam = NULL, size = NULL, prob = NULL, mu = NULL,
  pois_eps = NULL, nb_eps = NULL, rho = NULL, epsilon = 0.001,
  maxit = 1000)

`n`	the sample size (i.e. the length of each simulated variable)
`k_cont`	the number of continuous variables (default = 0)
`k_cat`	the number of ordinal (r >= 2 categories) variables (default = 0)
`k_pois`	the number of Poisson variables (default = 0)
`k_nb`	the number of Negative Binomial variables (default = 0)
`method`	the method used to generate the `k_cont` continuous variables. "Fleishman" uses a third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation.
`constants`	a matrix with `k_cont` rows, each a vector of constants c0, c1, c2, c3 (if `method` = "Fleishman") or c0, c1, c2, c3, c4, c5 (if `method` = "Polynomial") like that returned by `find_constants`
`marginal`	a list of length equal to `k_cat`; the i-th element is a vector of the cumulative probabilities defining the marginal distribution of the i-th variable; if the variable can take r values, the vector will contain r - 1 probabilities (the r-th is assumed to be 1; default = list())
`support`	a list of length equal to `k_cat`; the i-th element is a vector of containing the r ordered support values; if not provided (i.e. `support` = list()), the default is for the i-th element to be the vector 1, ..., r
`lam`	a vector of lambda (> 0) constants for the Poisson variables (see `Poisson`)
`size`	a vector of size parameters for the Negative Binomial variables (see `NegBinomial`)
`prob`	a vector of success probability parameters
`mu`	a vector of mean parameters (*Note: either `prob` or `mu` should be supplied for all Negative Binomial variables, not a mixture; default = NULL)
`pois_eps`	a vector of length `k_pois` containing the truncation values (i.e. = rep(0.0001, k_pois); default = NULL)
`nb_eps`	a vector of length `k_nb` containing the truncation values (i.e. = rep(0.0001, k_nb); default = NULL)
`rho`	the target correlation matrix (must be ordered ordinal, continuous, Poisson, Negative Binomial; default = NULL)
`epsilon`	the maximum acceptable error between the final and target correlation matrices (default = 0.001) in the calculation of ordinal intermediate correlations with `ordnorm`
`maxit`	the maximum number of iterations to use (default = 1000) in the calculation of ordinal intermediate correlations with `ordnorm`

the intermediate MVN correlation matrix

The intermediate correlations used in correlation method 2 are less simulation based than those in correlation method 1, and no seed is needed. Their calculations involve greater utilization of correction loops which make iterative adjustments until a maximum error has been reached (if possible). In addition, method 2 differs from method 1 in the following ways:

1) The intermediate correlations involving count variables are based on the methods of Barbiero & Ferrari (2012, doi: 10.1080/00273171.2012.692630, 2015, doi: 10.1002/asmb.2072). The Poisson or Negative Binomial support is made finite by removing a small user-specified value (i.e. 1e-06) from the total cumulative probability. This truncation factor may differ for each count variable. The count variables are subsequently treated as ordinal and intermediate correlations are calculated using the correction loop of ordnorm.

2) The continuous - count variable correlations are based on an extension of the method of Demirtas et al. (2012, doi: 10.1002/sim.5362), and the count variables are treated as ordinal. The correction factor is the product of the power method correlation between the continuous variable and the normal variable used to generate it (see Headrick & Kowalchuk, 2007, doi: 10.1080/10629360600605065) and the point-polyserial correlation between the ordinalized count variable and the normal variable used to generate it (see Olsson et al., 1982, doi: 10.1007/BF02294164). The intermediate correlations are the ratio of the target correlations to the correction factor.

The processes used to find the intermediate correlations for each variable type are described below. Please see the corresponding function help page for more information:

Correlations are computed pairwise. If both variables are binary, the method of Demirtas et al. (2012, doi: 10.1002/sim.5362) is used to find the tetrachoric correlation (code adapted from Tetra.Corr.BB). This method is based on Emrich and Piedmonte's (1991, doi: 10.1080/00031305.1991.10475828) work, in which the joint binary distribution is determined from the third and higher moments of a multivariate normal distribution: Let Y_{1} and Y_{2} be binary variables with E[Y_{1}] = Pr(Y_{1} = 1) = p_{1}, E[Y_{2}] = Pr(Y_{2} = 1) = p_{2}, and correlation ρ_{y1y2}. Let Φ[x_{1}, x_{2}, ρ_{x1x2}] be the standard bivariate normal cumulative distribution function, given by:

Φ[x_{1}, x_{2}, ρ_{x1x2}] = \int_{-∞}^{x_{1}} \int_{-∞}^{x_{2}} f(z_{1}, z_{2}, ρ_{x1x2}) dz_{1} dz_{2}

where

f(z_{1}, z_{2}, ρ_{x1x2}) = [2π√{1 - ρ_{x1x2}^2}]^{-1} * exp[-0.5(z_{1}^2 - 2ρ_{x1x2}z_{1}z_{2} + z_{2}^2)/(1 - ρ_{x1x2}^2)]

Then solving the equation

Φ[z(p_{1}), z(p_{2}), ρ_{x1x2}] = ρ_{y1y2}√{p_{1}(1 - p_{1})p_{2}(1 - p_{2})} + p_{1}p_{2}

for ρ_{x1x2} gives the intermediate correlation of the standard normal variables needed to generate binary variables with correlation ρ_{y1y2}. Here z(p) indicates the pth quantile of the standard normal distribution.

Otherwise, ordnorm is called for each pair. If the resulting intermediate matrix is not positive-definite, there is a warning given because it may not be possible to find a MVN correlation matrix that will produce the desired marginal distributions after discretization. Any problems with positive-definiteness are corrected later.

Correlations are computed pairwise. findintercorr_cont is called for each pair.

max_count_support is used to find the maximum support value given the vector pois_eps of truncation values. This is used to create a Poisson marginal list consisting of cumulative probabilities for each variable (like that for the ordinal variables). Then ordnorm is called to calculate the intermediate MVN correlation for all Poisson variables.

max_count_support is used to find the maximum support value given the vector nb_eps of truncation values. This is used to create a Negative Binomial marginal list consisting of cumulative probabilities for each variable (like that for the ordinal variables). Then ordnorm is called to calculate the intermediate MVN correlation for all Negative Binomial variables.

findintercorr_cont_cat is called to calculate the intermediate MVN correlation for all Continuous and Ordinal combinations.

The Poisson marginal list is appended to the ordinal marginal list (similarly for the support lists). Then ordnorm is called to calculate the intermediate MVN correlation for all Ordinal and Poisson combinations.

The Negative Binomial marginal list is appended to the ordinal marginal list (similarly for the support lists). Then ordnorm is called to calculate the intermediate MVN correlation for all Ordinal and Negative Binomial combinations.

findintercorr_cont_pois2 is called to calculate the intermediate MVN correlation for all Continuous and Poisson combinations.

findintercorr_cont_nb2 is called to calculate the intermediate MVN correlation for all Continuous and Negative Binomial combinations.

The Negative Binomial marginal list is appended to the Poisson marginal list (similarly for the support lists). Then ordnorm is called to calculate the intermediate MVN correlation for all Poisson and Negative Binomial combinations.

Please see rcorrvar2 for additional references.

Emrich LJ & Piedmonte MR (1991). A Method for Generating High-Dimensional Multivariate Binary Variables. The American Statistician, 45(4): 302-4. doi: 10.1080/00031305.1991.10475828.

Inan G & Demirtas H (2016). BinNonNor: Data Generation with Binary and Continuous Non-Normal Components. R package version 1.3. https://CRAN.R-project.org/package=BinNonNor

Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48, 465-471. doi: 10.1007/BF02293687.

find_constants, rcorrvar2

## Not run: 

# Binary, Ordinal, Continuous, Poisson, and Negative Binomial Variables

options(scipen = 999)
seed <- 1234
n <- 10000

# Continuous Distributions: Normal, t (df = 10), Chisq (df = 4),
# Beta (a = 4, b = 2), Gamma (a = 4, b = 4)
Dist <- c("Gaussian", "t", "Chisq", "Beta", "Gamma")

# calculate standardized cumulants
# those for the normal and t distributions are rounded to ensure the
# correct values (i.e. skew = 0)

M1 <- round(calc_theory(Dist = "Gaussian", params = c(0, 1)), 8)
M2 <- round(calc_theory(Dist = "t", params = 10), 8)
M3 <- calc_theory(Dist = "Chisq", params = 4)
M4 <- calc_theory(Dist = "Beta", params = c(4, 2))
M5 <- calc_theory(Dist = "Gamma", params = c(4, 4))
M <- cbind(M1, M2, M3, M4, M5)
M <- round(M[-c(1:2),], digits = 6)
colnames(M) <- Dist
rownames(M) <- c("skew", "skurtosis", "fifth", "sixth")
means <- rep(0, length(Dist))
vars <- rep(1, length(Dist))

# calculate constants
con <- matrix(1, nrow = ncol(M), ncol = 6)
for (i in 1:ncol(M)) {
 con[i, ] <- find_constants(method = "Polynomial", skews = M[1, i],
                            skurts = M[2, i], fifths = M[3, i],
                            sixths = M[4, i])
}

# Binary and Ordinal Distributions
marginal <- list(0.3, 0.4, c(0.1, 0.5), c(0.3, 0.6, 0.9),
                 c(0.2, 0.4, 0.7, 0.8))
support <- list()

# Poisson Distributions
lam <- c(1, 5, 10)

# Negative Binomial Distributions
size <- c(3, 6)
prob <- c(0.2, 0.8)

ncat <- length(marginal)
ncont <- ncol(M)
npois <- length(lam)
nnb <- length(size)

# Create correlation matrix from a uniform distribution (-0.8, 0.8)
set.seed(seed)
Rey <- diag(1, nrow = (ncat + ncont + npois + nnb))
for (i in 1:nrow(Rey)) {
  for (j in 1:ncol(Rey)) {
    if (i > j) Rey[i, j] <- runif(1, -0.8, 0.8)
    Rey[j, i] <- Rey[i, j]
  }
}

# Test for positive-definiteness
library(Matrix)
if(min(eigen(Rey, symmetric = TRUE)$values) < 0) {
  Rey <- as.matrix(nearPD(Rey, corr = T, keepDiag = T)$mat)
}

# Make sure Rey is within upper and lower correlation limits
valid <- valid_corr2(k_cat = ncat, k_cont = ncont, k_pois = npois,
                     k_nb = nnb, method = "Polynomial", means = means,
                     vars = vars, skews = M[1, ], skurts = M[2, ],
                     fifths = M[3, ], sixths = M[4, ],
                     marginal = marginal, lam = lam,
                     pois_eps = rep(0.0001, npois),
                     size = size, prob = prob,
                     nb_eps = rep(0.0001, nnb),
                     rho = Rey, seed = seed)

# Find intermediate correlation
Sigma2 <- findintercorr2(n = n, k_cont = ncont, k_cat = ncat,
                         k_pois = npois, k_nb = nnb,
                         method = "Polynomial", constants = con,
                         marginal = marginal, lam = lam, size = size,
                         prob = prob, pois_eps = rep(0.0001, npois),
                         nb_eps = rep(0.0001, nnb), rho = Rey)
Sigma2


## End(Not run)

SimMultiCorrData documentation built on May 2, 2019, 9:50 a.m.

SimMultiCorrData index

README.md Benefits of SimMultiCorrData and Comparison to Other Packages" Comparison of Correlation Method 1 and Correlation Method 2" Comparison of Simulated Distribution to Theoretical Distribution or Empirical Data" Functions by Topic" Overall Workflow for Data Simulation" Overview of Error Loop" Using the Sixth Cumulant Correction to Find Valid Power Method Pdfs" Variable Types"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SimMultiCorrData
Simulation of Correlated Data with Multiple Variable Types

findintercorr2: Calculate Intermediate MVN Correlation for Ordinal,...
In SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types

Description

Usage

Arguments

Value

Overview of Correlation Method 2

Ordinal Variables

Continuous Variables

Poisson Variables

Negative Binomial Variables

Continuous - Ordinal Pairs

Ordinal - Poisson Pairs

Ordinal - Negative Binomial Pairs

Continuous - Poisson Pairs

Continuous - Negative Binomial Pairs

Poisson - Negative Binomial Pairs

References

See Also

Examples

Related to findintercorr2 in SimMultiCorrData...

R Package Documentation

Browse R Packages

We want your feedback!

SimMultiCorrData Simulation of Correlated Data with Multiple Variable Types

findintercorr2: Calculate Intermediate MVN Correlation for Ordinal,... In SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types

Description

Usage

Arguments

Value

Overview of Correlation Method 2

Ordinal Variables

Continuous Variables

Poisson Variables

Negative Binomial Variables

Continuous - Ordinal Pairs

Ordinal - Poisson Pairs

Ordinal - Negative Binomial Pairs

Continuous - Poisson Pairs

Continuous - Negative Binomial Pairs

Poisson - Negative Binomial Pairs

References

See Also

Examples

Related to findintercorr2 in SimMultiCorrData...

R Package Documentation

Browse R Packages

We want your feedback!

SimMultiCorrData
Simulation of Correlated Data with Multiple Variable Types

findintercorr2: Calculate Intermediate MVN Correlation for Ordinal,...
In SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types