parallelPCA: Perform Horn's parallel analysis to choose the number of...

View source: R/parallelPCA.R

parallelPCAR Documentation

Perform Horn's parallel analysis to choose the number of principal components to retain.

Description

Perform Horn's parallel analysis to choose the number of principal components to retain.

Usage

parallelPCA(
  mat,
  max.rank = 100,
  ...,
  niters = 50,
  threshold = 0.1,
  transposed = FALSE,
  BSPARAM = ExactParam(),
  BPPARAM = SerialParam()
)

Arguments

mat

A numeric matrix where rows correspond to variables and columns correspond to samples.

max.rank

Integer scalar specifying the maximum number of PCs to retain.

...

Further arguments to pass to pca.

niters

Integer scalar specifying the number of iterations to use for the parallel analysis.

threshold

Numeric scalar representing the “p-value” threshold above which PCs are to be ignored.

transposed

Logical scalar indicating whether mat is transposed, i.e., rows are samples and columns are variables.

BSPARAM

A BiocSingularParam object specifying the algorithm to use for PCA.

BPPARAM

A BiocParallelParam object specifying how the iterations should be paralellized.

Details

Horn's parallel analysis involves shuffling observations within each row of x to create a permuted matrix. PCA is performed on the permuted matrix to obtain the percentage of variance explained under a random null hypothesis. This is repeated over several iterations to obtain a distribution of curves on the scree plot.

For each PC, the “p-value” (for want of a better word) is defined as the proportion of iterations where the variance explained at that PC is greater than that observed with the original matrix. The number of PCs to retain is defined as the last PC where the p-value is below threshold. This aims to retain all PCs that explain “significantly” more variance than expected by chance.

This function can be sped up by specifying BSPARAM=IrlbaParam() or similar, to use approximate strategies for performing the PCA. Another option is to set BPPARAM to perform the iterations in parallel.

Value

A list is returned, containing:

  • original, the output from running pca on mat with the specified arguments.

  • permuted, a matrix of variance explained from randomly permuted matrices. Each column corresponds to a single permutated matrix, while each row corresponds to successive principal components.

  • n, the estimated number of principal components to retain.

Author(s)

Aaron Lun

Examples

  # Mocking up some data.
  ngenes <- 1000
  means <- 2^runif(ngenes, 6, 10)
  dispersions <- 10/means + 0.2
  nsamples <- 50
  counts <- matrix(rnbinom(ngenes*nsamples, mu=means, 
    size=1/dispersions), ncol=nsamples)

  # Choosing the number of PCs
  lcounts <- log2(counts + 1)
  output <- parallelPCA(lcounts)
  output$n


kevinblighe/PCAtools documentation built on Oct. 22, 2023, 12:01 p.m.