GenerateData: Mixed type simulation data generator for sparse CCA

Description Usage Arguments Value Examples

View source: R/GenerateData.R

Description

GenerateData is used to generate two sets of data of mixed types for sparse CCA under the Gaussian copula model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
GenerateData(
  n,
  trueidx1,
  trueidx2,
  Sigma1,
  Sigma2,
  maxcancor,
  copula1 = "no",
  copula2 = "no",
  type1 = "continuous",
  type2 = "continuous",
  muZ = NULL,
  c1 = NULL,
  c2 = NULL
)

Arguments

n

Sample size

trueidx1

True canonical direction of length p1 for X1. It will be automatically normalized such that w_1^T Σ_1 w_1 = 1.

trueidx2

True canonical direction of length p2 for X2. It will be automatically normalized such that w_2^T Σ_2 w_2 = 1.

Sigma1

True correlation matrix of latent variable Z1 (p1 by p1).

Sigma2

True correlation matrix of latent variable Z2 (p2 by p2).

maxcancor

True canonical correlation between Z1 and Z2.

copula1

Copula type for the first dataset. U1 = f(Z1), which could be either "exp", "cube".

copula2

Copula type for the second dataset. U2 = f(Z2), which could be either "exp", "cube".

type1

Type of the first dataset X1. Could be "continuous", "trunc" or "binary".

type2

Type of the second dataset X2. Could be "continuous", "trunc" or "binary".

muZ

Mean of latent multivariate normal.

c1

Constant threshold for X1 needed for "trunc" and "binary" data type - the default is NULL.

c2

Constant threshold for X2 needed for "trunc" and "binary" data type - the default is NULL.

Value

GenerateData returns a list containing

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
### Simple example

# Data setting
n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets.
maxcancor <- 0.9 # true canonical correlation

# Correlation structure within each data set
set.seed(0)
perm1 <- sample(1:p1, size = p1);
Sigma1 <- autocor(p1, 0.7)[perm1, perm1]
blockind <- sample(1:3, size = p2, replace = TRUE);
Sigma2 <- blockcor(blockind, 0.7)
mu <- rbinom(p1+p2, 1, 0.5)

# true variable indices for each dataset
trueidx1 <- c(rep(1, 3), rep(0, p1-3))
trueidx2 <- c(rep(1, 2), rep(0, p2-2))

# Data generation
simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor,
                        Sigma1 = Sigma1, Sigma2 = Sigma2,
                        copula1 = "exp", copula2 = "cube",
                        muZ = mu,
                        type1 = "trunc", type2 = "trunc",
                        c1 = rep(1, p1), c2 =  rep(0, p2)
)
X1 <- simdata$X1
X2 <- simdata$X2

# Check the range of truncation levels of variables
range(colMeans(X1 == 0))
range(colMeans(X2 == 0))

mixedCCA documentation built on March 21, 2021, 1:07 a.m.