preprocess: Pre-processing function for sex classification
In Oshlack/speckle: Statistical methods for analysing single cell RNA-seq data

preprocess

R Documentation

Pre-processing function for sex classification

Description

The purpose of this function is to process a single cell counts matrix into the appropriate format for the classifySex function.

Usage

preprocess(x, genome = genome, qc = qc)

Arguments

`x`	the counts matrix, rows are genes and columns are cells. Row names must be gene symbols.
`genome`	the genome the data arises from. Current options are human: genome = "Hs" or mouse: genome = "Mm".
`qc`	logical, indicates whether to perform additional quality control on the cells. qc = TRUE will predict cells that pass quality control only and the filtered cells will not be classified. qc = FALSE will predict every cell except the cells with zero counts on XIST/Xist and the sum of the Y genes. Default is TRUE.

Details

This function will filter out cells that are unable to be classified due to zero counts on *XIST/Xist* and all of the Y chromosome genes. If qc=TRUE additional cells are removed as identified by the perCellQCMetrics and quickPerCellQC functions from the scuttle package. The resulting counts matrix is then log-normalised and scaled.

Value

outputs a list object with the following components

`tcm.final`	A transposed count matrix where rows are cells and columns are the features used for classification.
`data.df`	The normalised and scaled `tcm.final` matrix.
`discarded.cells`	Character vector of cell IDs for the cells that are discarded when `qc=TRUE`.
`zero.cells`	Character vector of cell IDs for the cells that can not be classified as male/female due to zero counts on Xist and all the Y chromosome genes.

Examples


library(speckle)
library(SingleCellExperiment)
library(CellBench)
library(org.Hs.eg.db)

# Get data from CellBench library
sc_data <- load_sc_data()
sc_10x <- sc_data$sc_10x

# Get counts matrix in correct format with gene symbol as rownames 
# rather than ENSEMBL ID.
counts <- counts(sc_10x)
ann <- select(org.Hs.eg.db, keys=rownames(sc_10x),
             columns=c("ENSEMBL","SYMBOL"), keytype="ENSEMBL")
m <- match(rownames(counts), ann$ENSEMBL)
rownames(counts) <- ann$SYMBOL[m]

# Preprocess data
pro.data <- preprocess(counts, genome="Hs", qc = TRUE)

# Look at counts on XIST and superY.all
plot(pro.data$tcm.final$XIST, pro.data$tcm.final$superY)

# Cells that are identified as low quality
pro.data$discarded.cells

# Cells with zero counts on XIST and all Y genes
pro.data$zero.cells

Oshlack/speckle documentation built on Oct. 16, 2022, 9:39 a.m.