scQC: Performs single-cell data quality control

Description Usage Arguments Value References Examples

View source: R/scQC.R

Description

This function performs quality control filters over the provided input matrix, the function checks for minimum cell library size, mitochondrial ratio, outlier cells, and the fraction of cells where a gene is expressed.

Usage

1
2
3
4
5
6
7
scQC(
  X,
  minLibSize = 1000,
  removeOutlierCells = TRUE,
  minPCT = 0.05,
  maxMTratio = 0.1
)

Arguments

X

Raw counts matrix with cells as columns and genes (symbols) as rows.

minLibSize

An integer value. Defines the minimum library size required for a cell to be included in the analysis.

removeOutlierCells

A boolean value (TRUE/FALSE), if TRUE, the identified cells with library size greater than 1.58 IQR/sqrt(n) computed from the sample, are removed. For further details see: ?boxplot.stats

minPCT

A decimal value between 0 and 1. Defines the minimum fraction of cells where the gene needs to be expressed to be included in the analysis.

maxMTratio

A decimal value between 0 and 1. Defines the maximum ratio of mitochondrial reads (mithocondrial reads / library size) present in a cell to be included in the analysis. It's computed using the symbol genes starting with 'MT-' non-case sensitive.

Value

A dgCMatrix object with the cells and the genes that pass the quality control filters.

References

Ilicic, Tomislav, et al. "Classification of low quality cells from single-cell RNA-seq data." Genome biology 17.1 (2016): 29.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
library(scTenifoldNet)

# Simulating of a dataset following a negative binomial distribution with high sparcity (~67%)
nCells = 2000
nGenes = 100
set.seed(1)
X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98)
X <- round(X)
X <- matrix(X, ncol = nCells)
rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10))

# Performing Single cell quality control
qcOutput <- scQC(
  X = X,
  minLibSize = 30,
  removeOutlierCells = TRUE,
  minPCT = 0.05,
  maxMTratio = 0.1
)

# Visualizing the Differences
oldPar <- par(no.readonly = TRUE)

par(
  mfrow = c(2, 2),
  mar = c(3, 3, 1, 1),
  mgp = c(1.5, 0.5, 0)
)
# Library Size
plot(
  Matrix::colSums(X),
  ylim = c(20, 70),
  ylab = 'Library Size',
 xlab = 'Cell',
  main = 'Library Size - Before QC'
)
abline(h = c(30, 58),
       lty = 2,
       col = 'red')
plot(
  Matrix::colSums(qcOutput),
  ylim = c(20, 70),
  ylab = 'Library Size',
  xlab = 'Cell',
  main = 'Library Size - After QC'
)
abline(h = c(30, 58),
       lty = 2,
       col = 'red')
# Mitochondrial ratio
mtGenes <- grepl('^mt-', rownames(X), ignore.case = TRUE)
plot(
  Matrix::colSums(X[mtGenes,]) / Matrix::colSums(X),
  ylim = c(0, 0.3),
  ylab = 'Mitochondrial Ratio',
  xlab = 'Cell',
  main = 'Mitochondrial Ratio - Before QC'
)
abline(h = c(0.1), lty = 2, col = 'red')
plot(
  Matrix::colSums(qcOutput[mtGenes,]) / Matrix::colSums(qcOutput),
  ylim = c(0, 0.3),
  ylab = 'Mitochondrial Ratio',
  xlab = 'Cell',
  main = 'Mitochondrial Ratio - Before QC'
)
abline(h = c(0.1), lty = 2, col = 'red')

par(oldPar)

scTenifoldNet documentation built on Oct. 29, 2021, 9:08 a.m.