labelCells | R Documentation |
Given marker gene sets for cell types, identify cells with
high expression of the marker genes (positive examples), then use these
cells to create a reference transcriptome profile for each cell type and
identify additional cells of each type using SingleR
. These
marker genes should specifically expressed a single cell type, e.g.
CD3 which is expressed by all T cell subtypes would not be suitable
for specific T cell subtypes.
labelCells(
sce,
markergenes,
fraction_topscoring = 0.01,
expr_values = "logcounts",
normGenesetExpressionParams = list(R = 200),
aggregateReferenceParams = list(power = 0.5),
SingleRParams = list(),
BPPARAM = SerialParam()
)
sce |
|
markergenes |
Named |
fraction_topscoring |
|
expr_values |
Integer scalar or string indicating which assay of
|
normGenesetExpressionParams |
|
aggregateReferenceParams |
|
SingleRParams |
|
BPPARAM |
An optional |
A list
of three elements named cells
, refs
and labels
.
cells
contains a list
with the numerical indices of the top
scoring cells for each cell type.
refs
contains the pseudo-bulk transcriptome profiles used as a
reference for label assignment, as returned by aggregateReference
.
labels
contains a DataFrame
with the
annotation statistics for each cell (one cell per row), generated by
SingleR
.
Michael Stadler
normGenesetExpression
used to calculate scores for
marker gene sets; aggregateReference
used to
create reference profiles; SingleR
used to assign
labels to cells.
if (requireNamespace("SingleR", quietly = TRUE) &&
requireNamespace("SingleCellExperiment", quietly = TRUE) &&
requireNamespace("scrapper", quietly = TRUE)) {
# create SingleCellExperiment with cell-type specific genes
library(SingleCellExperiment)
n_types <- 3
n_per_type <- 30
n_cells <- n_types * n_per_type
n_genes <- 500
fraction_specific <- 0.1
n_specific <- round(n_genes * fraction_specific)
set.seed(42)
mu <- ceiling(runif(n = n_genes, min = 0, max = 30))
u <- do.call(rbind, lapply(mu, function(x) rpois(n_cells, lambda = x)))
rownames(u) <- paste0("g", seq.int(nrow(u)))
celltype.labels <- rep(paste0("t", seq.int(n_types)), each = n_per_type)
celltype.genes <- split(sample(rownames(u), size = n_types * n_specific),
rep(paste0("t", seq.int(n_types)), each = n_specific))
for (i in seq_along(celltype.genes)) {
j <- celltype.genes[[i]]
k <- celltype.labels == paste0("t", i)
u[j, k] <- 2 * u[j, k]
}
v <- log2(u + 1)
sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v))
# define marker genes (subset of true cell-type-specific genes)
marker.genes <- lapply(celltype.genes, "[", 1:5)
marker.genes
# predict cell types
res <- labelCells(sce, marker.genes,
fraction_topscoring = 0.1,
normGenesetExpressionParams = list(R = 50))
# high-scoring cells used as references for each celltype
res$cells
# ... from these, pseudo-bulks were created:
res$refs
# ... and used to predict labels for all cells
res$labels$pruned.labels
# compare predicted to true cell types
table(true = celltype.labels, predicted = res$labels$pruned.labels)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.