process_cell_meta: Processing single cell RNA-seq count data

View source: R/process_cell_meta.R

process_cell_metaR Documentation

Processing single cell RNA-seq count data

Description

A meta function for processing single cell RNA-seq count data, including quality control, normalization, dimensionality reduction.

Usage

process_cell_meta(
  sce,
  qc.metric = list(threshold = 1),
  qc.filter = list(nmads = 3),
  quick.clus = list(min.size = 100),
  com.sum.fct = list(max.cluster.size = 3000),
  log.norm = list(),
  prop = 0.1,
  min.dim = 13,
  max.dim = 50,
  model.var = list(),
  top.hvg = list(n = 3000),
  de.pca = list(assay.type = "logcounts"),
  pca = FALSE,
  tsne = list(dimred = "PCA", ncomponents = 2),
  umap = list(dimred = "PCA")
)

Arguments

sce

Single cell RNA-seq count data in SingleCellExperiment.

qc.metric

Quality control arguments in a named list passed to perCellQCMetrics, such as qc.metric=list(threshold=1).

qc.filter

Quality control filtering arguments in a named list passed to perCellQCFilters, such as qc.filter=list(nmads=3).

quick.clus

Arguments in a named list passed to quickCluster, such as quick.clus=list(min.size = 100).

com.sum.fct

Arguments in a named list passed to computeSumFactors, such as com.sum.fct=list(max.cluster.size = 3000)).

log.norm

Arguments in a named list passed to logNormCounts.

prop

Numeric scalar specifying the proportion of genes to report as highly variable genes (HVGs) in getTopHVGs. The default is 0.1.

min.dim, max.dim

Integer scalars specifying the minimum (min.dim) and maximum (max.dim) number of (principle components) PCs to retain respectively in denoisePCA. The default is min.dim=11, max. dim=50.

model.var

Additional arguments in a named list passed to modelGeneVar.

top.hvg

Additional arguments in a named list passed to getTopHVGs, such as top.hvg=list(n = 3000).

de.pca

Additional arguments in a named list passed to denoisePCA, such as de.pca=list(assay. type = "logcounts").

pca

Logical, if TRUE only the data with reduced dimentionality by PCA is returned and no clustering is performed. The default is FALSE and clustering is performed after dimensionality reduction.

tsne

Additional arguments in a named list passed to runTSNE, such as tsne=list(dimred="PCA", ncomponents=2).

umap

Additional arguments in a named list passed to runUMAP, such as umap=list(dimred="PCA").

Details

In the QC, frequently used per-cell metrics are calculated for identifying problematic cells, such as library size, number of detected features above a threshold, mitochodrial gene percentage, etc. Then these metrics are used to determine outlier cells based on median-absolute-deviation (MAD). Refer to perCellQCMetrics and perCellQCFilters in the scuttle package for more details. In the normalization, a quick-clustering method is applied to divide cells into clusters. Then a scaling normalization method is performed to obtain per-cluster size factors. Next, the size factor in each cluster is decomposed into per-cell size factors by a deconvolution strategy. Finally, all cells are normalized by per-cell size factors. See more details in quickCluster, computeSumFactors from the scran package, and logNormCounts from the scuttle package. In dimensionality reduction, the high-dimensional gene expression data are embedded into a 2-3 dimensional space using PCA, tSNE and UMAP. All three embedding result sets are stored in a SingleCellExperiment object. Details are seen in denoisePCA from scran, and runUMAP, runTSNE from scater.

Value

A SingleCellExperiment object.

Author(s)

Jianhai Zhang jzhan067@ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu

References

Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137–145. https://www.nature.com/articles/s41592-019-0654-x. McCarthy DJ, Campbell KR, Lun ATL, Willis QF (2017). “Scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R.” Bioinformatics, 33, 1179-1186. doi: 10.1093/bioinformatics/btw777. Lun ATL, McCarthy DJ, Marioni JC (2016). “A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.” F1000Res., 5, 2122. doi: 10.12688/f1000research.9501.2.

Examples

library(scran); library(scuttle); library(SummarizedExperiment) 
sce <- mockSCE()
sce.dimred <- process_cell_meta(sce, qc.metric=list(subsets=list(Mt=rowData(sce)$featureType=='mito'), threshold=1))

jianhaizhang/spatialHeatmap documentation built on July 31, 2024, 2:59 a.m.