GenerateData | R Documentation |
GenerateData
is used to generate two sets of data of mixed types for sparse CCA under the Gaussian copula model.
GenerateData( n, trueidx1, trueidx2, Sigma1, Sigma2, maxcancor, copula1 = "no", copula2 = "no", type1 = "continuous", type2 = "continuous", muZ = NULL, c1 = NULL, c2 = NULL )
n |
Sample size |
trueidx1 |
True canonical direction of length p1 for |
trueidx2 |
True canonical direction of length p2 for |
Sigma1 |
True correlation matrix of latent variable |
Sigma2 |
True correlation matrix of latent variable |
maxcancor |
True canonical correlation between |
copula1 |
Copula type for the first dataset. U1 = f(Z1), which could be either "exp", "cube". |
copula2 |
Copula type for the second dataset. U2 = f(Z2), which could be either "exp", "cube". |
type1 |
Type of the first dataset |
type2 |
Type of the second dataset |
muZ |
Mean of latent multivariate normal. |
c1 |
Constant threshold for |
c2 |
Constant threshold for |
GenerateData
returns a list containing
Z1: latent numeric data matrix (n by p1).
Z2: latent numeric data matrix (n by p2).
X1: observed numeric data matrix (n by p1).
X2: observed numeric data matrix (n by p2).
true_w1: normalized true canonical direction of length p1 for X1
.
true_w2: normalized true canonical direction of length p2 for X2
.
type: a vector containing types of two datasets.
maxcancor: true canonical correlation between Z1
and Z2
.
c1: constant threshold for X1
for "trunc" and "binary" data type.
c2: constant threshold for X2
for "trunc" and "binary" data type.
Sigma: true latent correlation matrix of Z1
and Z2
((p1+p2) by (p1+p2)).
### Simple example # Data setting n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets. maxcancor <- 0.9 # true canonical correlation # Correlation structure within each data set set.seed(0) perm1 <- sample(1:p1, size = p1); Sigma1 <- autocor(p1, 0.7)[perm1, perm1] blockind <- sample(1:3, size = p2, replace = TRUE); Sigma2 <- blockcor(blockind, 0.7) mu <- rbinom(p1+p2, 1, 0.5) # true variable indices for each dataset trueidx1 <- c(rep(1, 3), rep(0, p1-3)) trueidx2 <- c(rep(1, 2), rep(0, p2-2)) # Data generation simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor, Sigma1 = Sigma1, Sigma2 = Sigma2, copula1 = "exp", copula2 = "cube", muZ = mu, type1 = "trunc", type2 = "trunc", c1 = rep(1, p1), c2 = rep(0, p2) ) X1 <- simdata$X1 X2 <- simdata$X2 # Check the range of truncation levels of variables range(colMeans(X1 == 0)) range(colMeans(X2 == 0))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.