runColDataPCA: Perform PCA on column metadata
In davismcc/scater: Single-Cell Analysis Toolkit for Gene Expression Data in R

runColDataPCA

R Documentation

Perform PCA on column metadata

Description

Perform a principal components analysis (PCA) on cells, based on the column metadata in a SingleCellExperiment object.

Usage

runColDataPCA(
  x,
  ncomponents = 2,
  variables = NULL,
  scale = TRUE,
  outliers = FALSE,
  BSPARAM = ExactParam(),
  BPPARAM = SerialParam(),
  name = "PCA_coldata"
)

Arguments

`x`	A SingleCellExperiment object.
`ncomponents`	Numeric scalar indicating the number of principal components to obtain.
`variables`	List of strings or a character vector indicating which variables in `colData(x)` to use. If a list, each entry can also be an AsIs vector or a data.frame, as described in `?retrieveCellInfo`.
`scale`	Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8.
`outliers`	Logical indicating whether outliers should be detected based on PCA coordinates.
`BSPARAM`	A BiocSingularParam object specifying which algorithm should be used to perform the PCA.
`BPPARAM`	A BiocParallelParam object specifying whether the PCA should be parallelized.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

This function performs PCA on variables from the column-level metadata instead of the gene expression matrix. Doing so can be occasionally useful when other forms of experimental data are stored in the colData, e.g., protein intensities from FACs or other cell-specific phenotypic information.

This function is particularly useful for identifying low-quality cells based on QC metrics with outliers=TRUE. This uses an “outlyingness” measure computed by adjOutlyingness in the robustbase package. Outliers are defined those cells with outlyingness values more than 5 MADs above the median, using isOutlier.

Value

A SingleCellExperiment object containing the first ncomponent principal coordinates for each cell. By default, these are stored in the "PCA_coldata" entry of the reducedDims slot. The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar" attribute.

If outliers=TRUE, the output colData will also contain a logical outlier field. This specifies the cells that correspond to the identified outliers.

Author(s)

Aaron Lun, based on code by Davis McCarthy

Examples

example_sce <- mockSCE()
qc.df <- perCellQCMetrics(example_sce, subset=list(Mito=1:10))
colData(example_sce) <- cbind(colData(example_sce), qc.df)

# Can supply names of colData variables to 'variables',
# as well as AsIs-wrapped vectors of interest.
example_sce <- runColDataPCA(example_sce, variables=list(
    "sum", "detected", "subsets_Mito_percent", "altexps_Spikes_percent" 
))
reducedDimNames(example_sce)
head(reducedDim(example_sce))

davismcc/scater documentation built on June 12, 2025, 12:41 a.m.