# estimateR: Estimate latent correlation matrix In mixedCCA: Sparse Canonical Correlation Analysis for High-Dimensional Mixed Data

## Description

Estimation of latent correlation matrix from observed data of (possibly) mixed types (continuous/biary/truncated continuous) based on the latent Gaussian copula model.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```estimateR( X, type = "trunc", method = "approx", use.nearPD = TRUE, nu = 0.01, tol = 0.001, verbose = FALSE ) estimateR_mixed( X1, X2, type1 = "trunc", type2 = "continuous", method = "approx", use.nearPD = TRUE, nu = 0.01, tol = 0.001, verbose = FALSE ) ```

## Arguments

 `X` A numeric data matrix (n by p), n is the sample size and p is the number of variables. `type` A type of variables in `X`, must be one of "continuous", "binary" or "trunc". `method` The calculation method of latent correlation. Either "original" method or "approx". If `method = "approx"`, multilinear approximation method is used, which is much faster than the original method. If `method = "original"`, optimization of the bridge inverse function is used. The default is "approx". `use.nearPD` A logical value indicating whether to use nearPD or not when the resulting correlation estimator is not positive definite (have at least one negative eigenvalue). `nu` Shrinkage parameter for correlation matrix, must be between 0 and 1, the default value is 0.01. `tol` Desired accuracy when calculating the solution of bridge function. `verbose` If `verbose = FALSE`, printing information whether nearPD is used or not is disabled. The default value is FALSE. `X1` A numeric data matrix (n by p1). `X2` A numeric data matrix (n by p2). `type1` A type of variables in `X1`, must be one of "continuous", "binary" or "trunc". `type2` A type of variables in `X2`, must be one of "continuous", "binary" or "trunc".

## Value

`estimateR` returns

• type: Type of the data matrix `X`

• R: Estimated p by p latent correlation matrix of `X`

`estimateR_mixed` returns

• type1: Type of the data matrix `X1`

• type2: Type of the data matrix `X2`

• R: Estimated latent correlation matrix of whole `X` = (`X1`, `X2`) (p1+p2 by p1+p2)

• R1: Estimated latent correlation matrix of `X1` (p1 by p1)

• R2: Estimated latent correlation matrix of `X2` (p2 by p2)

• R12: Estimated latent correlation matrix between `X1` and `X2` (p1 by p2)

## References

Fan J., Liu H., Ning Y. and Zou H. (2017) "High dimensional semiparametric latent graphicalmodel for mixed data" <doi:10.1111/rssb.12168>.

Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" <doi:10.1093/biomet/asaa007>.

Yoon G., MÃ¼ller C.L., Gaynanova I. (2020) "Fast computation of latent correlations" <arXiv:2006.13875>.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37``` ```### Data setting n <- 100; p1 <- 15; p2 <- 10 # sample size and dimensions for two datasets. maxcancor <- 0.9 # true canonical correlation ### Correlation structure within each data set set.seed(0) perm1 <- sample(1:p1, size = p1); Sigma1 <- autocor(p1, 0.7)[perm1, perm1] blockind <- sample(1:3, size = p2, replace = TRUE); Sigma2 <- blockcor(blockind, 0.7) mu <- rbinom(p1+p2, 1, 0.5) ### true variable indices for each dataset trueidx1 <- c(rep(1, 3), rep(0, p1-3)) trueidx2 <- c(rep(1, 2), rep(0, p2-2)) ### Data generation simdata <- GenerateData(n=n, trueidx1 = trueidx1, trueidx2 = trueidx2, maxcancor = maxcancor, Sigma1 = Sigma1, Sigma2 = Sigma2, copula1 = "exp", copula2 = "cube", muZ = mu, type1 = "trunc", type2 = "continuous", c1 = rep(1, p1), c2 = rep(0, p2) ) X1 <- simdata\$X1 X2 <- simdata\$X2 ### Check the range of truncation levels of variables range(colMeans(X1 == 0)) range(colMeans(X2 == 0)) ### Estimate latent correlation matrix # with original method R1_org <- estimateR(X1, type = "trunc", method = "original")\$R # with faster approximation method R1_approx <- estimateR(X1, type = "trunc", method = "approx")\$R R12_approx <- estimateR_mixed(X1, X2, type1 = "trunc", type2 = "continuous", method = "approx")\$R12 ```

mixedCCA documentation built on March 21, 2021, 1:07 a.m.