Description Usage Arguments Value Overview of Correlation Method 1 Ordinal Variables Continuous Variables Poisson Variables Negative Binomial Variables Continuous - Ordinal Pairs Ordinal - Poisson Pairs Ordinal - Negative Binomial Pairs Continuous - Poisson Pairs Continuous - Negative Binomial Pairs Poisson - Negative Binomial Pairs References See Also Examples
View source: R/findintercorr.R
This function calculates a k x k
intermediate matrix of correlations, where k = k_cat + k_cont + k_pois + k_nb
,
to be used in simulating variables with rcorrvar
. The ordering of the variables must be
ordinal, continuous, Poisson, and Negative Binomial (note that it is possible for k_cat
, k_cont
, k_pois
,
and/or k_nb
to be 0).
The function first checks that the target correlation matrix rho
is positive-definite and the marginal distributions for the
ordinal variables are cumulative probabilities with r - 1 values (for r categories). There is a warning given at the end of simulation
if the calculated intermediate correlation matrix Sigma
is not positive-definite. This function is called by the simulation function
rcorrvar
, and would only be used separately if the user wants to find the intermediate correlation matrix
only. The simulation functions also return the intermediate correlation matrix.
1 2 3 4 5 |
n |
the sample size (i.e. the length of each simulated variable) |
k_cont |
the number of continuous variables (default = 0) |
k_cat |
the number of ordinal (r >= 2 categories) variables (default = 0) |
k_pois |
the number of Poisson variables (default = 0) |
k_nb |
the number of Negative Binomial variables (default = 0) |
method |
the method used to generate the |
constants |
a matrix with |
marginal |
a list of length equal to |
support |
a list of length equal to |
nrand |
the number of random numbers to generate in calculating the bound (default = 10000) |
lam |
a vector of lambda (> 0) constants for the Poisson variables (see |
size |
a vector of size parameters for the Negative Binomial variables (see |
prob |
a vector of success probability parameters |
mu |
a vector of mean parameters (*Note: either |
rho |
the target correlation matrix (must be ordered ordinal, continuous, Poisson, Negative Binomial; default = NULL) |
seed |
the seed value for random number generation (default = 1234) |
epsilon |
the maximum acceptable error between the final and target correlation matrices (default = 0.001)
in the calculation of ordinal intermediate correlations with |
maxit |
the maximum number of iterations to use (default = 1000) in the calculation of ordinal
intermediate correlations with |
the intermediate MVN correlation matrix
The intermediate correlations used in correlation method 1 are more simulation based than those in correlation method 2, which means that accuracy increases with sample size and the number of repetitions. In addition, specifying the seed allows for reproducibility. In addition, method 1 differs from method 2 in the following ways:
1) The intermediate correlation for count variables is based on the method of Yahav & Shmueli (2012, doi: 10.1002/asmb.901), which uses a simulation based, logarithmic transformation of the target correlation. This method becomes less accurate as the variable mean gets closer to zero.
2) The ordinal - count variable correlations are based on an extension of the method of Amatya & Demirtas (2015, doi: 10.1080/00949655.2014.953534), in which the correlation correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between the count variable and the normal variable used to generate it and a simulated upper bound on the correlation between an ordinal variable and the normal variable used to generate it (see Demirtas & Hedeker, 2011, doi: 10.1198/tast.2011.10090).
3) The continuous - count variable correlations are based on an extension of the methods of Amatya & Demirtas (2015) and Demirtas et al. (2012, doi: 10.1002/sim.5362), in which the correlation correction factor is the product of the upper Frechet-Hoeffding bound on the correlation between the count variable and the normal variable used to generate it and the power method correlation between the continuous variable and the normal variable used to generate it (see Headrick & Kowalchuk, 2007, doi: 10.1080/10629360600605065). The intermediate correlations are the ratio of the target correlations to the correction factor.
The processes used to find the intermediate correlations for each variable type are described below. Please see the corresponding function help page for more information:
Correlations are computed pairwise. If both variables are binary, the method of Demirtas et al. (2012, doi: 10.1002/sim.5362) is used to find the
tetrachoric correlation (code adapted from Tetra.Corr.BB
). This method is based on Emrich and Piedmonte's
(1991, doi: 10.1080/00031305.1991.10475828) work, in which the joint binary distribution is determined from the third and higher moments of a multivariate normal
distribution: Let Y_{1} and Y_{2} be binary variables with E[Y_{1}] = Pr(Y_{1} = 1) = p_{1},
E[Y_{2}] = Pr(Y_{2} = 1) = p_{2}, and correlation ρ_{y1y2}. Let Φ[x_{1}, x_{2}, ρ_{x1x2}] be the
standard bivariate normal cumulative distribution function, given by:
Φ[x_{1}, x_{2}, ρ_{x1x2}] = \int_{-∞}^{x_{1}} \int_{-∞}^{x_{2}} f(z_{1}, z_{2}, ρ_{x1x2}) dz_{1} dz_{2}
where
f(z_{1}, z_{2}, ρ_{x1x2}) = [2π√{1 - ρ_{x1x2}^2}]^{-1} * exp[-0.5(z_{1}^2 - 2ρ_{x1x2}z_{1}z_{2} + z_{2}^2)/(1 - ρ_{x1x2}^2)]
Then solving the equation
Φ[z(p_{1}), z(p_{2}), ρ_{x1x2}] = ρ_{y1y2}√{p_{1}(1 - p_{1})p_{2}(1 - p_{2})} + p_{1}p_{2}
for ρ_{x1x2} gives the intermediate correlation of the standard normal variables needed to generate binary variables with correlation ρ_{y1y2}. Here z(p) indicates the pth quantile of the standard normal distribution.
Otherwise, ordnorm
is called for each pair. If the resulting
intermediate matrix is not positive-definite, there is a warning given because it may not be possible to find a MVN correlation
matrix that will produce the desired marginal distributions after discretization. Any problems with positive-definiteness are
corrected later.
Correlations are computed pairwise. findintercorr_cont
is called for each pair.
findintercorr_pois
is called to calculate the intermediate MVN correlation for all Poisson variables.
findintercorr_nb
is called to calculate the intermediate MVN correlation for all Negative
Binomial variables.
findintercorr_cont_cat
is called to calculate the intermediate MVN correlation for all
Continuous and Ordinal combinations.
findintercorr_cat_pois
is called to calculate the intermediate MVN correlation for all
Ordinal and Poisson combinations.
findintercorr_cat_nb
is called to calculate the intermediate MVN correlation for all
Ordinal and Negative Binomial combinations.
findintercorr_cont_pois
is called to calculate the intermediate MVN correlation for all
Continuous and Poisson combinations.
findintercorr_cont_nb
is called to calculate the intermediate MVN correlation for all
Continuous and Negative Binomial combinations.
findintercorr_pois_nb
is called to calculate the intermediate MVN correlation for all
Poisson and Negative Binomial combinations.
Please see rcorrvar
for additional references.
Emrich LJ & Piedmonte MR (1991). A Method for Generating High-Dimensional Multivariate Binary Variables. The American Statistician, 45(4): 302-4. doi: 10.1080/00031305.1991.10475828.
Inan G & Demirtas H (2016). BinNonNor: Data Generation with Binary and Continuous Non-Normal Components. R package version 1.3. https://CRAN.R-project.org/package=BinNonNor
Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48, 465-471. doi: 10.1007/BF02293687.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ## Not run:
# Binary, Ordinal, Continuous, Poisson, and Negative Binomial Variables
options(scipen = 999)
seed <- 1234
n <- 10000
# Continuous Distributions: Normal, t (df = 10), Chisq (df = 4),
# Beta (a = 4, b = 2), Gamma (a = 4, b = 4)
Dist <- c("Gaussian", "t", "Chisq", "Beta", "Gamma")
# calculate standardized cumulants
# those for the normal and t distributions are rounded to ensure the
# correct values (i.e. skew = 0)
M1 <- round(calc_theory(Dist = "Gaussian", params = c(0, 1)), 8)
M2 <- round(calc_theory(Dist = "t", params = 10), 8)
M3 <- calc_theory(Dist = "Chisq", params = 4)
M4 <- calc_theory(Dist = "Beta", params = c(4, 2))
M5 <- calc_theory(Dist = "Gamma", params = c(4, 4))
M <- cbind(M1, M2, M3, M4, M5)
M <- round(M[-c(1:2),], digits = 6)
colnames(M) <- Dist
rownames(M) <- c("skew", "skurtosis", "fifth", "sixth")
means <- rep(0, length(Dist))
vars <- rep(1, length(Dist))
# calculate constants
con <- matrix(1, nrow = ncol(M), ncol = 6)
for (i in 1:ncol(M)) {
con[i, ] <- find_constants(method = "Polynomial", skews = M[1, i],
skurts = M[2, i], fifths = M[3, i],
sixths = M[4, i])
}
# Binary and Ordinal Distributions
marginal <- list(0.3, 0.4, c(0.1, 0.5), c(0.3, 0.6, 0.9),
c(0.2, 0.4, 0.7, 0.8))
support <- list()
# Poisson Distributions
lam <- c(1, 5, 10)
# Negative Binomial Distributions
size <- c(3, 6)
prob <- c(0.2, 0.8)
ncat <- length(marginal)
ncont <- ncol(M)
npois <- length(lam)
nnb <- length(size)
# Create correlation matrix from a uniform distribution (-0.8, 0.8)
set.seed(seed)
Rey <- diag(1, nrow = (ncat + ncont + npois + nnb))
for (i in 1:nrow(Rey)) {
for (j in 1:ncol(Rey)) {
if (i > j) Rey[i, j] <- runif(1, -0.8, 0.8)
Rey[j, i] <- Rey[i, j]
}
}
# Test for positive-definiteness
library(Matrix)
if(min(eigen(Rey, symmetric = TRUE)$values) < 0) {
Rey <- as.matrix(nearPD(Rey, corr = T, keepDiag = T)$mat)
}
# Make sure Rey is within upper and lower correlation limits
valid <- valid_corr(k_cat = ncat, k_cont = ncont, k_pois = npois,
k_nb = nnb, method = "Polynomial", means = means,
vars = vars, skews = M[1, ], skurts = M[2, ],
fifths = M[3, ], sixths = M[4, ], marginal = marginal,
lam = lam, size = size, prob = prob, rho = Rey,
seed = seed)
# Find intermediate correlation
Sigma1 <- findintercorr(n = n, k_cont = ncont, k_cat = ncat, k_pois = npois,
k_nb = nnb, method = "Polynomial", constants = con,
marginal = marginal, lam = lam, size = size,
prob = prob, rho = Rey, seed = seed)
Sigma1
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.