chooseGavishDonoho: Choosing PCs with the Gavish-Donoho method
In PCAtools: PCAtools: Everything Principal Components Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Use the Gavish-Donoho method to determine the optimal number of PCs to retain.

1	chooseGavishDonoho(x, .dim = dim(x), var.explained, noise)

`x`	The data matrix used for the PCA, containing variables in rows and observations in columns. Ignored if `dim` is supplied.
`.dim`	An integer vector containing the dimensions of the data matrix used for PCA. The first element should contain the number of variables and the second element should contain the number of observations.
`var.explained`	A numeric vector containing the variance explained by successive PCs. This should be sorted in decreasing order. Note that this should be the variance explained, NOT the percentage of variance explained!
`noise`	Numeric scalar specifying the variance of the random noise.

Assuming that x is the sum of some low-rank truth and some i.i.d. random matrix with variance noise, the Gavish-Donoho method defines a threshold on the singular values that minimizes the reconstruction error from the PCs. This provides a mathematical definition of the “optimal” choice of the number of PCs for a given matrix, though it depends on both the i.i.d. assumption and an estimate for noise.

An integer scalar specifying the number of PCs to retain. The effective limit on the variance explained is returned in the attributes.

Aaron Lun

chooseMarchenkoPastur, parallelPCA and findElbowPoint, for other approaches to choosing the number of PCs.

truth <- matrix(rnorm(1000), nrow=100)
truth <- truth[,sample(ncol(truth), 1000, replace=TRUE)]
obs <- truth + rnorm(length(truth), sd=2)

# Note, we need the variance explained, NOT the percentage
# of variance explained! 
pcs <- pca(obs)
chooseGavishDonoho(obs, var.explained=pcs$sdev^2, noise=4)