QuickCB2: All-in-one function from raw data to filtered cell matrix

Description Usage Arguments Details Value Examples

View source: R/QuickCB2.R

Description

All-in-one function for scCB2. Take 10x output raw data as input and return either a matrix of real cells identified by CB2 or a Seurat object containing this matrix, which can be incorporated with downstream analysis using Seurat pipeline.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
QuickCB2(
  dir = NULL,
  h5file = NULL,
  FDR_threshold = 0.01,
  MTfilter = 1,
  MTgene = NULL,
  AsSeurat = FALSE,
  Ncores = 2,
  ...
)

Arguments

dir

The directory of 10x output data. For Cell Ranger version <3, directory should include three files: barcodes.tsv, genes.tsv, matrix.mtx. For Cell Ranger version >=3, directory should include three files: barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz.

h5file

The path of 10x output HDF5 file (ended with .h5).

FDR_threshold

Numeric between 0 and 1. Default: 0.01. The False Discovery Rate (FDR) to be controlled for multiple testing.

MTfilter

Numeric value between 0 and 1. Default: 1 (No filtering). For each barcode, if the proportion of mitochondrial gene expression exceeds MTfilter, this barcode will be filtered out. No barcode exceeds 100% mitochondrial gene expression, thus the default (100%) corresponds to no filtering. The proportion of mitochondrial gene expressions is usually a criterion for evaluating cell quality, and is calculated using the scaled sum of all genes starting with "MT-" (human) or "mt-" (mouse) if row names are gene symbols, or customized mitochondrial genes specified by MTgene.

MTgene

Character vector. User may specify customized mitochondrial gene IDs to perform the filtering. This should correspond to a subset of row names in raw data.

AsSeurat

Logical. Default: FALSE. Decides whether a Seurat object is returned instead of cell matrix. Set to TRUE so that users can directly apply Seurat pipeline for downstream analyses.

Ncores

Positive integer. Default: detectCores() - 2. Number of cores for parallel computation.

...

Additional arguments to be passed to CB2FindCell.

Details

QuickCB2 is a quick function to apply CB2 on 10x Cell Ranger raw data by combining Read10xRaw, Read10xRawH5, CB2FindCell and GetCellMat into one simple function under default parameters.

Value

Either a sparse matrix of real cells identified by CB2 or a Seurat object containing real cell matrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# simulate 10x output files
data(mbrainSub)
mbrainSub <- mbrainSub[,1:10000]
data_dir <- file.path(tempdir(),"CB2example")
dir.create(data_dir)
gene_name <- rownames(mbrainSub)

# For simplicity, use gene names to generate gene IDs to fit the format.
gene_id <- paste0("ENSG_fake_",gene_name)
barcode_id <- colnames(mbrainSub)
Matrix::writeMM(mbrainSub,file = file.path(data_dir,"matrix.mtx"))
write.table(barcode_id,file = file.path(data_dir,"barcodes.tsv"),
    sep = "\t", quote = FALSE, col.names = FALSE, row.names = FALSE)
write.table(cbind(gene_id,gene_name),file = file.path(data_dir,"genes.tsv"),
    sep = "\t", quote = FALSE, col.names = FALSE, row.names = FALSE)

# Run QuickCB2 on 10x raw data and get cell matrix.
# Control FDR at 1%. Use 2-core parallel computation.

RealCell <- QuickCB2(dir = data_dir, 
                     FDR_threshold = 0.01,
                     Ncores = 2)
str(RealCell)

scCB2 documentation built on Nov. 8, 2020, 5:48 p.m.