labelCells: Assign labels to cells using known marker genes

View source: R/cellLabeling.R

labelCellsR Documentation

Assign labels to cells using known marker genes

Description

Given marker gene sets for cell types, identify cells with high expression of the marker genes (positive examples), then use these cells to create a reference transcriptome profile for each cell type and identify additional cells of each type using SingleR. These marker genes should specifically expressed a single cell type, e.g. CD3 which is expressed by all T cell subtypes would not be suitable for specific T cell subtypes.

Usage

labelCells(
  sce,
  markergenes,
  fraction_topscoring = 0.01,
  expr_values = "logcounts",
  normGenesetExpressionParams = list(R = 200),
  aggregateReferenceParams = list(power = 0.5),
  SingleRParams = list(),
  BPPARAM = SerialParam()
)

Arguments

sce

SingleCellExperiment object.

markergenes

Named list of character vectors with the marker genes for each cell types. The marker genes must be a subset of rownames(sce).

fraction_topscoring

numeric vector of length 1 or the same length as markergenes giving the fraction(s) of top scoring cells for each cell type to pick to create the reference transcriptome profile.

expr_values

Integer scalar or string indicating which assay of sce contains the expression values.

normGenesetExpressionParams

list with additional parameters for normGenesetExpression.

aggregateReferenceParams

list with additional parameters for aggregateReference.

SingleRParams

list with additional parameters for SingleR.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation.

Value

A list of three elements named cells, refs and labels. cells contains a list with the numerical indices of the top scoring cells for each cell type. refs contains the pseudo-bulk transcriptome profiles used as a reference for label assignment, as returned by aggregateReference. labels contains a DataFrame with the annotation statistics for each cell (one cell per row), generated by SingleR.

Author(s)

Michael Stadler

See Also

normGenesetExpression used to calculate scores for marker gene sets; aggregateReference used to create reference profiles; SingleR used to assign labels to cells.

Examples

if (requireNamespace("SingleR", quietly = TRUE) &&
    requireNamespace("SingleCellExperiment", quietly = TRUE) &&
    requireNamespace("scrapper", quietly = TRUE)) {
    
    # create SingleCellExperiment with cell-type specific genes
    library(SingleCellExperiment)
    n_types <- 3
    n_per_type <- 30
    n_cells <- n_types * n_per_type
    n_genes <- 500
    fraction_specific <- 0.1
    n_specific <- round(n_genes * fraction_specific)

    set.seed(42)
    mu <- ceiling(runif(n = n_genes, min = 0, max = 30))
    u <- do.call(rbind, lapply(mu, function(x) rpois(n_cells, lambda = x)))
    rownames(u) <- paste0("g", seq.int(nrow(u)))
    celltype.labels <- rep(paste0("t", seq.int(n_types)), each = n_per_type)
    celltype.genes <- split(sample(rownames(u), size = n_types * n_specific),
                            rep(paste0("t", seq.int(n_types)), each = n_specific))
    for (i in seq_along(celltype.genes)) {
        j <- celltype.genes[[i]]
        k <- celltype.labels == paste0("t", i)
        u[j, k] <- 2 * u[j, k]
    }
    v <- log2(u + 1)
    sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v))

    # define marker genes (subset of true cell-type-specific genes)
    marker.genes <- lapply(celltype.genes, "[", 1:5)
    marker.genes

    # predict cell types
    res <- labelCells(sce, marker.genes,
                      fraction_topscoring = 0.1,
                      normGenesetExpressionParams = list(R = 50))

    # high-scoring cells used as references for each celltype
    res$cells

    # ... from these, pseudo-bulks were created:
    res$refs

    # ... and used to predict labels for all cells
    res$labels$pruned.labels

    # compare predicted to true cell types
    table(true = celltype.labels, predicted = res$labels$pruned.labels)
}
      

fmicompbio/swissknife documentation built on June 11, 2025, 4:17 p.m.