| preprocess | R Documentation |
The purpose of this function is to process a single cell counts matrix into
the appropriate format for the classifySex function.
preprocess(x, genome = genome, qc = qc)
x |
the counts matrix, rows are genes and columns are cells. Row names must be gene symbols. |
genome |
the genome the data arises from. Current options are human: genome = "Hs" or mouse: genome = "Mm". |
qc |
logical, indicates whether to perform additional quality control on the cells. qc = TRUE will predict cells that pass quality control only and the filtered cells will not be classified. qc = FALSE will predict every cell except the cells with zero counts on *XIST/Xist* and the sum of the Y genes. Default is TRUE. |
This function will filter out cells that are unable to be classified due to
zero counts on *XIST/Xist* and all of the Y chromosome genes. If
qc=TRUE additional cells are removed as identified by the
perCellQCMetrics and quickPerCellQC functions from the
scuttle package. The resulting counts matrix is then log-normalised
and scaled.
outputs a list object with the following components
tcm.final |
A transposed count matrix where rows are cells and columns are the features used for classification. |
data.df |
The normalised and scaled |
discarded.cells |
Character vector of cell IDs for the cells that are
discarded when |
zero.cells |
Character vector of cell IDs for the cells that can not be classified as male/female due to zero counts on *Xist* and all the Y chromosome genes. |
library(speckle)
library(SingleCellExperiment)
library(CellBench)
library(org.Hs.eg.db)
# Get data from CellBench library
sc_data <- load_sc_data()
sc_10x <- sc_data$sc_10x
# Get counts matrix in correct format with gene symbol as rownames
# rather than ENSEMBL ID.
counts <- counts(sc_10x)
ann <- select(org.Hs.eg.db, keys=rownames(sc_10x),
columns=c("ENSEMBL","SYMBOL"), keytype="ENSEMBL")
m <- match(rownames(counts), ann$ENSEMBL)
rownames(counts) <- ann$SYMBOL[m]
# Preprocess data
pro.data <- preprocess(counts, genome="Hs", qc = TRUE)
# Look at counts on XIST and superY.all
plot(pro.data$tcm.final$XIST, pro.data$tcm.final$superY)
# Cells that are identified as low quality
pro.data$discarded.cells
# Cells with zero counts on XIST and all Y genes
pro.data$zero.cells
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.