chooseGavishDonoho: Choosing PCs with the Gavish-Donoho method

View source: R/randomMethods.R

chooseGavishDonohoR Documentation

Choosing PCs with the Gavish-Donoho method

Description

Use the Gavish-Donoho method to determine the optimal number of PCs to retain.

Usage

chooseGavishDonoho(x, .dim = dim(x), var.explained, noise)

Arguments

x

The data matrix used for the PCA, containing variables in rows and observations in columns. Ignored if dim is supplied.

.dim

An integer vector containing the dimensions of the data matrix used for PCA. The first element should contain the number of variables and the second element should contain the number of observations.

var.explained

A numeric vector containing the variance explained by successive PCs. This should be sorted in decreasing order. Note that this should be the variance explained, NOT the percentage of variance explained!

noise

Numeric scalar specifying the variance of the random noise.

Details

Assuming that x is the sum of some low-rank truth and some i.i.d. random matrix with variance noise, the Gavish-Donoho method defines a threshold on the singular values that minimizes the reconstruction error from the PCs. This provides a mathematical definition of the “optimal” choice of the number of PCs for a given matrix, though it depends on both the i.i.d. assumption and an estimate for noise.

Value

An integer scalar specifying the number of PCs to retain. The effective limit on the variance explained is returned in the attributes.

Author(s)

Aaron Lun

See Also

chooseMarchenkoPastur, parallelPCA and findElbowPoint, for other approaches to choosing the number of PCs.

Examples

truth <- matrix(rnorm(1000), nrow=100)
truth <- truth[,sample(ncol(truth), 1000, replace=TRUE)]
obs <- truth + rnorm(length(truth), sd=2)

# Note, we need the variance explained, NOT the percentage
# of variance explained! 
pcs <- pca(obs)
chooseGavishDonoho(obs, var.explained=pcs$sdev^2, noise=4)


kevinblighe/PCAtools documentation built on Oct. 22, 2023, 12:01 p.m.