chooseMarchenkoPastur: Choosing PCs with the Marchenko-Pastur limit

View source: R/randomMethods.R

chooseMarchenkoPasturR Documentation

Choosing PCs with the Marchenko-Pastur limit

Description

Use the Marchenko-Pastur limit to choose the number of top PCs to retain.

Usage

chooseMarchenkoPastur(x, .dim = dim(x), var.explained, noise)

Arguments

x

The data matrix used for the PCA, containing variables in rows and observations in columns. Ignored if dim is supplied.

.dim

An integer vector containing the dimensions of the data matrix used for PCA. The first element should contain the number of variables and the second element should contain the number of observations.

var.explained

A numeric vector containing the variance explained by successive PCs. This should be sorted in decreasing order. Note that this should be the variance explained, NOT the percentage of variance explained!

noise

Numeric scalar specifying the variance of the random noise.

Details

For a random matrix with i.i.d. values, the Marchenko-Pastur (MP) limit defines the maximum eigenvalue. Let us assume that x is the sum of some low-rank truth and some i.i.d. random matrix with variance noise. We can use the MP limit to determine the maximum variance that could be explained by a fully random PC; all PCs that explain more variance are thus likely to contain real structure and should be retained.

Of course, this has some obvious caveats such as the unrealistic i.i.d. assumption and the need to estimate noise. Moreover, PCs below the MP limit are not necessarily uninformative or lacking structure; it is just that their variance explained does not match the most extreme case that random noise has to offer.

Value

An integer scalar specifying the number of PCs with variance explained beyond the MP limit. The limit itself is returned in the attributes.

Author(s)

Aaron Lun

See Also

chooseGavishDonoho, parallelPCA and findElbowPoint, for other approaches to choosing the number of PCs.

Examples

truth <- matrix(rnorm(1000), nrow=100)
truth <- truth[,sample(ncol(truth), 1000, replace=TRUE)]
obs <- truth + rnorm(length(truth), sd=2)

# Note, we need the variance explained, NOT the percentage
# of variance explained! 
pcs <- pca(obs)
chooseMarchenkoPastur(obs, var.explained=pcs$sdev^2, noise=4)


kevinblighe/PCAtools documentation built on Oct. 22, 2023, 12:01 p.m.