null_eigval: Eigenvalue estimation for null Gaussian based testing... In pkimes/sigclust2: sigclust2: Statistical Significance of Clustering

Description

Function to compute the eigenvalues of the null Gaussian distribution for significance of clustering testing procedures which rely on a null Gaussian factor model assumption. When the number of observations is substantially greater than the number of features, the sample covariance matrix should be used.

Usage

 `1` ```null_eigval(x, n, p, icovest = 1, bkgd_pca = FALSE) ```

Arguments

 `x` a matrix of size n by p containing the original data. `n` an integer number of samples. `p` an integer number of features/covariates. `icovest` an integer between 1 and 3 corresponding to the covariance estimation procedure to use. See details for more information on the possible estimation procedures. (default = 1) `bkgd_pca` a logical value specifying whether to use scaled PCA scores over raw data to estimate the background noise. When FALSE, raw estimate is used; when TRUE, minimum of PCA and raw estimates is used. (default = FALSE)

Details

The following possible options are given for null covariance estimation

1. soft thresholding: recommended approach described in Huang et al. 2014

2. sample: uses sample covariance matrix, equivalent to soft and hard options when n > p, but when p > n, will produce conservative results, i.e. less significant p-values

3. hard thresholding: approach described in Liu et al. 2008, no longer recommended - retained for historical purposes

Value

The function returns a list of estimated parameters for the null Gaussian distribution used in significance of clustering testing. The list includes:

• `eigval_dat`: eigenvalues for sample covariance matrix

• `backvar`: background noise, sigma_b^2

• `eigval_sim`: eigenvalues to be used for simulation

Patrick Kimes

References

• Huang, H., Liu, Y., Yuan, M., and Marron, J. S. (2014). Statistical Significance of Clustering using Soft Thresholding. Journal of Computational and Graphical Statistics, preprint.

• Liu, Y., Hayes, D. N., Nobel, A. B., and Marron, J. S. (2008). Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data. Journal of the American Statistical Association, 103(483):1281-1293.

pkimes/sigclust2 documentation built on May 25, 2019, 8:20 a.m.