pcaElbow: Quickly estimate the 'elbow' of a scree plot (PCA)

View source: R/utility_functions.R

pcaElbowR Documentation

Quickly estimate the 'elbow' of a scree plot (PCA)

Description

This function uses a rough algorithm to estimate a sensible 'elbow' to choose for a PCA scree plot of eigenvalues. The function looks at an initial arbitrarily 'low' level of variance and looks for the first eigenvalue lower than this. If the very first eigenvalue is actually lower than this (i.e, when the PCs are not very explanatory) then this 'low' value is iteratively halved until this is no longer the case. After starting below this arbitrary threshold the drop in variance explained by each pair of consecutive PCs is standardized by dividing over the larger of the pair. The largest percentage drop in the series below 'low'

Usage

pcaElbow(varpc, low = 0.08, max.pc = 0.9)

Arguments

varpc

numeric, vector of eigenvalues, or 'percentage of variance' explained datapoints for each principle component. If only using a partial set of components, should first pass to estimate.eig.vpcs() to estimate any missing eigenvalues.

low

numeric, between zero and one, the threshold to define that a principle component does not explain much 'of the variance'.

max.pc

maximum percentage of the variance to capture before the elbow (cumulative sum to PC 'n')

Value

The number of last principle component to keep, prior to the determined elbow cutoff

Author(s)

Nicholas Cooper

Examples

# correlated data
mat <- sim.cor(100,50)
result <- princomp(mat)
eig <- result$sdev^2
elb.a <- quick.elbow(eig)
pca.scree.plot(eig,elbow=elb.a,M=mat)
elb.b <- quick.elbow(eig,low=.05) # decrease 'low' to select more components
pca.scree.plot(eig,elbow=elb.b,M=mat)
# random (largely independent) data, usually higher elbow #
mat2 <- generate.test.matrix(5,3)
result2 <- princomp(mat2)
eig2 <- result2$sdev^2
elb2 <- quick.elbow(result2$sdev^2)
pca.scree.plot(eig2,elbow=elb2,M=mat2)

NMikolajewicz/scMiko documentation built on June 28, 2023, 1:41 p.m.