latentcor | R Documentation |

Estimation of latent correlation matrix from observed data of (possibly) mixed types (continuous/binary/truncated/ternary) based on the latent Gaussian copula model. Missing values (NA) are allowed. The estimation is based on pairwise complete observations.

latentcor( X, types = NULL, method = c("approx", "original"), use.nearPD = TRUE, nu = 0.001, tol = 1e-08, ratio = 0.9, showplot = FALSE )

`X` |
A numeric matrix or numeric data frame (n by p), where n is number of samples, and p is number of variables. Missing values (NA) are allowed, in which case the estimation is based on pairwise complete observations. |

`types` |
A vector of length p indicating the type of each of the p variables in |

`method` |
The calculation method for latent correlations. Either |

`use.nearPD` |
Logical indicator. |

`nu` |
Shrinkage parameter for the correlation matrix, must be between 0 and 1. Guarantees that the minimal eigenvalue of returned correlation matrix is greater or equal to |

`tol` |
When |

`ratio` |
When |

`showplot` |
Logical indicator. |

The function estimates latent correlation by calculating inverse bridge function (Fan et al., 2017) evaluated at the value of sample Kendall's tau (*\hat τ*). The bridge function F connects Kendall's tau to latent correlation r so that *F(r) = E(\hat τ)*. The form of function F depends on the variable types (continuous/binary/truncated/ternary), but is exact. The exact form of inverse is not available, so has to be evaluated numerically for each pair of variables leading to `Rpointwise`

.

When `method = "original"`

, the inversion is done numerically by solving

*minimize_r (F(r) - \hat τ)^2*

using `optimize`

. The parameter `tol`

is used to control the accuracy of the solution.

When `method = "approx"`

, the inversion is done approximately by interpolating previously calculated and stored values of *F^{-1}(\hat τ)*. This is significantly faster than the original method (Yoon et al., 2021) for binary/ternary/truncated cases, however the approximation errors may be non-negligible on some regions of the space. The parameter `ratio`

controls the region where the interpolation is performed with default recommended value of 0.9 giving a good balance of accuracy and computational speed . Increasing the value of ratio may improve speed (but possibly sacrifice the accuracy), whereas decreasing the value of ratio my improve accuracy (but possibly sacrifice the speed). See Yoon et al. 2021 and vignette for more details.

In case the pointwise estimator `Rpointwise`

is has negative eigenvalues, it is projected onto the space of positive semi-definite matrices using `nearPD`

. The parameter `nu`

further allows to perform additional shrinkage towards identity matrix (desirable in cases where the number of variables p is very large) as

*R = (1 - ν) \tilde R + ν I,*

where *\tilde R* is `Rpointwise`

after projection by `nearPD`

.

`latentcor`

returns

zratios: A list of of length p corresponding to each variable. Returns NA for continuous variable; proportion of zeros for binary/truncated variables; the cumulative proportions of zeros and ones (e.g. first value is proportion of zeros, second value is proportion of zeros and ones) for ternary variable.

K: (p x p) Kendall Tau (Tau-a) Matrix for

`X`

R: (p x p) Estimated latent correlation matrix for

`X`

Rpointwise: (p x p) Point-wise estimates of latent correlations for

`X`

. This matrix is not guaranteed to be semi-positive definite. This is the original estimated latent correlation matrix without adjustment for positive-definiteness.plotR: Heatmap plot of latent correlation matrix

`R`

, NULL if`showplot = FALSE`

Fan J., Liu H., Ning Y. and Zou H. (2017) "High dimensional semiparametric latent graphical model for mixed data" doi: 10.1111/rssb.12168.

Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" doi: 10.1093/biomet/asaa007.

Yoon G., Müller C.L., Gaynanova I. (2021) "Fast computation of latent correlations" doi: 10.1080/10618600.2021.1882468.

# Example 1 - truncated data type, same type for all variables # Generate data X = gen_data(n = 300, types = rep("tru", 5))$X # Estimate latent correlation matrix with original method and check the timing start_time = proc.time() R_org = latentcor(X = X, types = "tru", method = "original")$R proc.time() - start_time # Estimate latent correlation matrix with approximation method and check the timing start_time = proc.time() R_approx = latentcor(X = X, types = "tru", method = "approx")$R proc.time() - start_time # Heatmap for latent correlation matrix. Heatmap_R_approx = latentcor(X = X, types = "tru", method = "approx", showplot = TRUE)$plotR # Example 2 - ternary/continuous case X = gen_data()$X # Estimate latent correlation matrix with original method R_nc_org = latentcor(X = X, types = c("ter", "con"), method = "original")$R # Estimate latent correlation matrix with aprroximation method R_nc_approx = latentcor(X = X, types = c("ter", "con"), method = "approx")$R

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.