preprocess | R Documentation |
The purpose of this function is to process a single cell counts matrix into
the appropriate format for the classifySex
function.
preprocess(x, genome = genome, qc = qc)
x |
the counts matrix, rows are genes and columns are cells. Row names must be gene symbols. |
genome |
the genome the data arises from. Current options are human: genome = "Hs" or mouse: genome = "Mm". |
qc |
logical, indicates whether to perform additional quality control on the cells. qc = TRUE will predict cells that pass quality control only and the filtered cells will not be classified. qc = FALSE will predict every cell except the cells with zero counts on *XIST/Xist* and the sum of the Y genes. Default is TRUE. |
This function will filter out cells that are unable to be classified due to
zero counts on *XIST/Xist* and all of the Y chromosome genes. If
qc=TRUE
additional cells are removed as identified by the
perCellQCMetrics
and quickPerCellQC
functions from the
scuttle
package. The resulting counts matrix is then log-normalised
and scaled.
outputs a list object with the following components
tcm.final |
A transposed count matrix where rows are cells and columns are the features used for classification. |
data.df |
The normalised and scaled |
discarded.cells |
Character vector of cell IDs for the cells that are
discarded when |
zero.cells |
Character vector of cell IDs for the cells that can not be classified as male/female due to zero counts on *Xist* and all the Y chromosome genes. |
library(speckle) library(SingleCellExperiment) library(CellBench) library(org.Hs.eg.db) # Get data from CellBench library sc_data <- load_sc_data() sc_10x <- sc_data$sc_10x # Get counts matrix in correct format with gene symbol as rownames # rather than ENSEMBL ID. counts <- counts(sc_10x) ann <- select(org.Hs.eg.db, keys=rownames(sc_10x), columns=c("ENSEMBL","SYMBOL"), keytype="ENSEMBL") m <- match(rownames(counts), ann$ENSEMBL) rownames(counts) <- ann$SYMBOL[m] # Preprocess data pro.data <- preprocess(counts, genome="Hs", qc = TRUE) # Look at counts on XIST and superY.all plot(pro.data$tcm.final$XIST, pro.data$tcm.final$superY) # Cells that are identified as low quality pro.data$discarded.cells # Cells with zero counts on XIST and all Y genes pro.data$zero.cells
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.