runColDataPCA: Perform PCA on column metadata

View source: R/runColDataPCA.R

runColDataPCAR Documentation

Perform PCA on column metadata

Description

Perform a principal components analysis (PCA) on cells, based on the column metadata in a SingleCellExperiment object.

Usage

runColDataPCA(
  x,
  ncomponents = 2,
  variables = NULL,
  scale = TRUE,
  outliers = FALSE,
  BSPARAM = ExactParam(),
  BPPARAM = SerialParam(),
  name = "PCA_coldata"
)

Arguments

x

A SingleCellExperiment object.

ncomponents

Numeric scalar indicating the number of principal components to obtain.

variables

List of strings or a character vector indicating which variables in colData(x) to use. If a list, each entry can also be an AsIs vector or a data.frame, as described in ?retrieveCellInfo.

scale

Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8.

outliers

Logical indicating whether outliers should be detected based on PCA coordinates.

BSPARAM

A BiocSingularParam object specifying which algorithm should be used to perform the PCA.

BPPARAM

A BiocParallelParam object specifying whether the PCA should be parallelized.

name

String specifying the name to be used to store the result in the reducedDims of the output.

Details

This function performs PCA on variables from the column-level metadata instead of the gene expression matrix. Doing so can be occasionally useful when other forms of experimental data are stored in the colData, e.g., protein intensities from FACs or other cell-specific phenotypic information.

This function is particularly useful for identifying low-quality cells based on QC metrics with outliers=TRUE. This uses an “outlyingness” measure computed by adjOutlyingness in the robustbase package. Outliers are defined those cells with outlyingness values more than 5 MADs above the median, using isOutlier.

Value

A SingleCellExperiment object containing the first ncomponent principal coordinates for each cell. By default, these are stored in the "PCA_coldata" entry of the reducedDims slot. The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar" attribute.

If outliers=TRUE, the output colData will also contain a logical outlier field. This specifies the cells that correspond to the identified outliers.

Author(s)

Aaron Lun, based on code by Davis McCarthy

See Also

runPCA, for the corresponding method operating on expression data.

Examples

example_sce <- mockSCE()
qc.df <- perCellQCMetrics(example_sce, subset=list(Mito=1:10))
colData(example_sce) <- cbind(colData(example_sce), qc.df)

# Can supply names of colData variables to 'variables',
# as well as AsIs-wrapped vectors of interest.
example_sce <- runColDataPCA(example_sce, variables=list(
    "sum", "detected", "subsets_Mito_percent", "altexps_Spikes_percent" 
))
reducedDimNames(example_sce)
head(reducedDim(example_sce))


Alanocallaghan/scater documentation built on July 18, 2024, 10:58 p.m.