parallelPCA: Perform Horn's parallel analysis to choose the number of...

Description Usage Arguments Details Value Author(s) Examples

View source: R/parallelPCA.R

Description

Perform Horn's parallel analysis to choose the number of principal components to retain.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
parallelPCA(
  mat,
  max.rank = 100,
  ...,
  niters = 50,
  threshold = 0.1,
  transposed = FALSE,
  BSPARAM = ExactParam(),
  BPPARAM = SerialParam()
)

Arguments

mat

A numeric matrix where rows correspond to variables and columns correspond to samples.

max.rank

Integer scalar specifying the maximum number of PCs to retain.

...

Further arguments to pass to pca.

niters

Integer scalar specifying the number of iterations to use for the parallel analysis.

threshold

Numeric scalar representing the “p-value” threshold above which PCs are to be ignored.

transposed

Logical scalar indicating whether mat is transposed, i.e., rows are samples and columns are variables.

BSPARAM

A BiocSingularParam object specifying the algorithm to use for PCA.

BPPARAM

A BiocParallelParam object specifying how the iterations should be paralellized.

Details

Horn's parallel analysis involves shuffling observations within each row of x to create a permuted matrix. PCA is performed on the permuted matrix to obtain the percentage of variance explained under a random null hypothesis. This is repeated over several iterations to obtain a distribution of curves on the scree plot.

For each PC, the “p-value” (for want of a better word) is defined as the proportion of iterations where the variance explained at that PC is greater than that observed with the original matrix. The number of PCs to retain is defined as the last PC where the p-value is below threshold. This aims to retain all PCs that explain “significantly” more variance than expected by chance.

This function can be sped up by specifying BSPARAM=IrlbaParam() or similar, to use approximate strategies for performing the PCA. Another option is to set BPPARAM to perform the iterations in parallel.

Value

A list is returned, containing:

Author(s)

Aaron Lun

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  # Mocking up some data.
  ngenes <- 1000
  means <- 2^runif(ngenes, 6, 10)
  dispersions <- 10/means + 0.2
  nsamples <- 50
  counts <- matrix(rnbinom(ngenes*nsamples, mu=means, 
    size=1/dispersions), ncol=nsamples)

  # Choosing the number of PCs
  lcounts <- log2(counts + 1)
  output <- parallelPCA(lcounts)
  output$n

PCAtools documentation built on Nov. 8, 2020, 8:17 p.m.