preprocess: Pre-processing function for sex classification

View source: R/preprocess.R

preprocessR Documentation

Pre-processing function for sex classification

Description

The purpose of this function is to process a single cell counts matrix into the appropriate format for the classifySex function.

Usage

preprocess(x, genome = genome, qc = qc)

Arguments

x

the counts matrix, rows are genes and columns are cells. Row names must be gene symbols.

genome

the genome the data arises from. Current options are human: genome = "Hs" or mouse: genome = "Mm".

qc

logical, indicates whether to perform additional quality control on the cells. qc = TRUE will predict cells that pass quality control only and the filtered cells will not be classified. qc = FALSE will predict every cell except the cells with zero counts on *XIST/Xist* and the sum of the Y genes. Default is TRUE.

Details

This function will filter out cells that are unable to be classified due to zero counts on *XIST/Xist* and all of the Y chromosome genes. If qc=TRUE additional cells are removed as identified by the perCellQCMetrics and quickPerCellQC functions from the scuttle package. The resulting counts matrix is then log-normalised and scaled.

Value

outputs a list object with the following components

tcm.final

A transposed count matrix where rows are cells and columns are the features used for classification.

data.df

The normalised and scaled tcm.final matrix.

discarded.cells

Character vector of cell IDs for the cells that are discarded when qc=TRUE.

zero.cells

Character vector of cell IDs for the cells that can not be classified as male/female due to zero counts on *Xist* and all the Y chromosome genes.

Examples


library(speckle)
library(SingleCellExperiment)
library(CellBench)
library(org.Hs.eg.db)

# Get data from CellBench library
sc_data <- load_sc_data()
sc_10x <- sc_data$sc_10x

# Get counts matrix in correct format with gene symbol as rownames 
# rather than ENSEMBL ID.
counts <- counts(sc_10x)
ann <- select(org.Hs.eg.db, keys=rownames(sc_10x),
             columns=c("ENSEMBL","SYMBOL"), keytype="ENSEMBL")
m <- match(rownames(counts), ann$ENSEMBL)
rownames(counts) <- ann$SYMBOL[m]

# Preprocess data
pro.data <- preprocess(counts, genome="Hs", qc = TRUE)

# Look at counts on XIST and superY.all
plot(pro.data$tcm.final$XIST, pro.data$tcm.final$superY)

# Cells that are identified as low quality
pro.data$discarded.cells

# Cells with zero counts on XIST and all Y genes
pro.data$zero.cells


Oshlack/speckle documentation built on Oct. 16, 2022, 9:39 a.m.