classifySex: Predict sex of cells in scRNA-seq data
In Oshlack/speckle: Statistical methods for analysing single cell RNA-seq data

classifySex

R Documentation

Predict sex of cells in scRNA-seq data

Description

This function will predict the sex for each cell in scRNA-seq data. The classifier is based on logistic regression models that have been trained on mouse and human single cell RNA-seq data.

Usage

classifySex(x, genome = NULL, qc = TRUE)

Arguments

`x`	counts matrix, rows correspond to genes and columns correspond to cells. Row names must be gene symbols.
`genome`	the genome the data arises from. Current options are human: genome = "Hs" or mouse: genome = "Mm".
`qc`	logical, indicates whether to perform quality control or not. qc = TRUE will predict cells that pass quality control only and the filtered cells will not be classified. qc = FALSE will predict every cell except the cells with zero counts on XIST/Xist and the sum of the Y genes. Default is TRUE.

Details

For bulk RNA-seq, checking the sex of the samples for mouse and human experiments is trivial as we can simply check the expression of *Xist/XIST*. It is not as simple for single cell RNA-seq data as the number of counts measured per gene and per cell is often quite low. Simply relying on cut-offs on the expression of genes like *Xist* means that many cells are unable to be classified. Hence we have developed a classifier based on a combination of X- and Y-linked genes in order to accurately predict the sex of each cell.

Cells with zero counts on Xist and the sum of the Y chromosome genes will not be classified as there is simply not enough information to accurately classify as Male/Female, and NAs will be returned. In addition, the user has the option to perform quality control on the data first, by specifying qc=TRUE, which will not classify cells that are deemed low-quality.

Value

a dataframe with predicted labels for each cell

Author(s)

Xinyi Jin

Examples


library(speckle)
library(SingleCellExperiment)
library(CellBench)
library(org.Hs.eg.db)

sc_data <- load_sc_data()
sc_10x <- sc_data$sc_10x

counts <- counts(sc_10x)
ann <- select(org.Hs.eg.db, keys=rownames(sc_10x),
             columns=c("ENSEMBL","SYMBOL"), keytype="ENSEMBL")
m <- match(rownames(counts), ann$ENSEMBL)
rownames(counts) <- ann$SYMBOL[m]

sex <- classifySex(counts, genome="Hs")

table(sex$prediction)
boxplot(counts["XIST",]~sex$prediction)

Oshlack/speckle documentation built on Oct. 16, 2022, 9:39 a.m.