| GenerateData | R Documentation |
GenerateData is used to generate two sets of data of mixed types for sparse CCA under the Gaussian copula model.
GenerateData( n, trueidx1, trueidx2, Sigma1, Sigma2, maxcancor, copula1 = "no", copula2 = "no", type1 = "continuous", type2 = "continuous", muZ = NULL, c1 = NULL, c2 = NULL )
n |
Sample size |
trueidx1 |
True canonical direction of length p1 for |
trueidx2 |
True canonical direction of length p2 for |
Sigma1 |
True correlation matrix of latent variable |
Sigma2 |
True correlation matrix of latent variable |
maxcancor |
True canonical correlation between |
copula1 |
Copula type for the first dataset. U1 = f(Z1), which could be either "exp", "cube". |
copula2 |
Copula type for the second dataset. U2 = f(Z2), which could be either "exp", "cube". |
type1 |
Type of the first dataset |
type2 |
Type of the second dataset |
muZ |
Mean of latent multivariate normal. |
c1 |
Constant threshold for |
c2 |
Constant threshold for |
GenerateData returns a list containing
Z1: latent numeric data matrix (n by p1).
Z2: latent numeric data matrix (n by p2).
X1: observed numeric data matrix (n by p1).
X2: observed numeric data matrix (n by p2).
true_w1: normalized true canonical direction of length p1 for X1.
true_w2: normalized true canonical direction of length p2 for X2.
type: a vector containing types of two datasets.
maxcancor: true canonical correlation between Z1 and Z2.
c1: constant threshold for X1 for "trunc" and "binary" data type.
c2: constant threshold for X2 for "trunc" and "binary" data type.
Sigma: true latent correlation matrix of Z1 and Z2 ((p1+p2) by (p1+p2)).
### Simple example
# Data setting
n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets.
maxcancor <- 0.9 # true canonical correlation
# Correlation structure within each data set
set.seed(0)
perm1 <- sample(1:p1, size = p1);
Sigma1 <- autocor(p1, 0.7)[perm1, perm1]
blockind <- sample(1:3, size = p2, replace = TRUE);
Sigma2 <- blockcor(blockind, 0.7)
mu <- rbinom(p1+p2, 1, 0.5)
# true variable indices for each dataset
trueidx1 <- c(rep(1, 3), rep(0, p1-3))
trueidx2 <- c(rep(1, 2), rep(0, p2-2))
# Data generation
simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor,
Sigma1 = Sigma1, Sigma2 = Sigma2,
copula1 = "exp", copula2 = "cube",
muZ = mu,
type1 = "trunc", type2 = "trunc",
c1 = rep(1, p1), c2 = rep(0, p2)
)
X1 <- simdata$X1
X2 <- simdata$X2
# Check the range of truncation levels of variables
range(colMeans(X1 == 0))
range(colMeans(X2 == 0))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.