mixedCCA: Sparse CCA for data of mixed types with BIC criterion

View source: R/KendallCCA.R

mixedCCAR Documentation

Sparse CCA for data of mixed types with BIC criterion

Description

Applies sparse canonical correlation analysis (CCA) for high-dimensional data of mixed types (continuous/binary/truncated continuous). Derived rank-based estimator instead of sample correlation matrix is implemented. There are two types of BIC criteria for variable selection. We found that BIC1 works best for variable selection, whereas BIC2 works best for prediction.

Usage

mixedCCA(
  X1,
  X2,
  type1,
  type2,
  lamseq1 = NULL,
  lamseq2 = NULL,
  nlamseq = 20,
  lam.eps = 0.01,
  w1init = NULL,
  w2init = NULL,
  BICtype,
  KendallR = NULL,
  maxiter = 100,
  tol = 0.01,
  trace = FALSE,
  lassoverbose = FALSE
)

Arguments

X1

A numeric data matrix (n by p1).

X2

A numeric data matrix (n by p2).

type1

A type of data X1 among "continuous", "binary", "trunc".

type2

A type of data X2 among "continuous", "binary", "trunc".

lamseq1

A tuning parameter sequence for X1. The length should be the same as lamseq2.

lamseq2

A tuning parameter sequence for X2. The length should be the same as lamseq1.

nlamseq

The number of tuning parameter sequence lambda - default is 20.

lam.eps

A ratio of the smallest value for lambda to the maximum value of lambda.

w1init

An initial vector of length p1 for canonical direction w1.

w2init

An initial vector of length p2 for canonical direction w2.

BICtype

Either 1 or 2: For more details for two options, see the reference.

KendallR

An estimated Kendall τ matrix. The default is NULL, which means that it will be automatically estimated by Kendall's τ estimator unless the user supplies.

maxiter

The maximum number of iterations allowed.

tol

The desired accuracy (convergence tolerance).

trace

If trace = TRUE, progress per each iteration will be printed. The default value is FALSE.

lassoverbose

If lassoverbose = TRUE, all warnings from lassobic optimization regarding convergence will be printed. The default value is lassoverbose = FALSE.

Value

mixedCCA returns a data.frame containing

  • KendallR: estimated Kendall's τ matrix estimator.

  • lambda_seq: the values of lamseq used for sparse CCA.

  • w1: estimated canonical direction w1.

  • w2: estimated canonical direction w2.

  • cancor: estimated canonical correlation.

  • fitresult: more details regarding the progress at each iteration.

References

Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" <doi:10.1093/biomet/asaa007>.

Examples

### Simple example

# Data setting
n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets.
maxcancor <- 0.9 # true canonical correlation

# Correlation structure within each data set
set.seed(0)
perm1 <- sample(1:p1, size = p1);
Sigma1 <- autocor(p1, 0.7)[perm1, perm1]
blockind <- sample(1:3, size = p2, replace = TRUE);
Sigma2 <- blockcor(blockind, 0.7)
mu <- rbinom(p1+p2, 1, 0.5)

# true variable indices for each dataset
trueidx1 <- c(rep(1, 3), rep(0, p1-3))
trueidx2 <- c(rep(1, 2), rep(0, p2-2))

# Data generation
simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor,
                        Sigma1 = Sigma1, Sigma2 = Sigma2,
                        copula1 = "exp", copula2 = "cube",
                        muZ = mu,
                        type1 = "trunc", type2 = "trunc",
                        c1 = rep(1, p1), c2 =  rep(0, p2)
)
X1 <- simdata$X1
X2 <- simdata$X2

# Check the range of truncation levels of variables
range(colMeans(X1 == 0))
range(colMeans(X2 == 0))

# Kendall CCA with BIC1
kendallcca1 <- mixedCCA(X1, X2, type1 = "trunc", type2 = "trunc", BICtype = 1, nlamseq = 10)

# Kendall CCA with BIC2. Estimated correlation matrix is plugged in from the above result.
R <- kendallcca1$KendallR
kendallcca2 <- mixedCCA(X1, X2, type1 = "trunc", type2 = "trunc",
                        KendallR = R, BICtype = 2, nlamseq = 10)

mixedCCA documentation built on Sept. 10, 2022, 1:06 a.m.