labelCells: Assign labels to cells using known marker genes
In fmicompbio/swissknife: Handy code shared in the FMI CompBio group

labelCells

R Documentation

Assign labels to cells using known marker genes

Description

Given marker gene sets for cell types, identify cells with high expression of the marker genes (positive examples), then use these cells to create a reference transcriptome profile for each cell type and identify additional cells of each type using SingleR. These marker genes should specifically expressed a single cell type, e.g. CD3 which is expressed by all T cell subtypes would not be suitable for specific T cell subtypes.

Usage

labelCells(
  sce,
  markergenes,
  fraction_topscoring = 0.01,
  expr_values = "logcounts",
  normGenesetExpressionParams = list(R = 200),
  aggregateReferenceParams = list(power = 0.5),
  SingleRParams = list(),
  BPPARAM = SerialParam()
)

Arguments

`sce`	`SingleCellExperiment` object.
`markergenes`	Named `list` of `character` vectors with the marker genes for each cell types. The marker genes must be a subset of `rownames(sce)`.
`fraction_topscoring`	`numeric` vector of length 1 or the same length as `markergenes` giving the fraction(s) of top scoring cells for each cell type to pick to create the reference transcriptome profile.
`expr_values`	Integer scalar or string indicating which assay of `sce` contains the expression values.
`normGenesetExpressionParams`	`list` with additional parameters for `normGenesetExpression`.
`aggregateReferenceParams`	`list` with additional parameters for `aggregateReference`.
`SingleRParams`	`list` with additional parameters for `SingleR`.
`BPPARAM`	An optional `BiocParallelParam` instance determining the parallel back-end to be used during evaluation.

Value

A list of three elements named cells, refs and labels. cells contains a list with the numerical indices of the top scoring cells for each cell type. refs contains the pseudo-bulk transcriptome profiles used as a reference for label assignment, as returned by aggregateReference. labels contains a DataFrame with the annotation statistics for each cell (one cell per row), generated by SingleR.

Author(s)

Michael Stadler

Examples

if (requireNamespace("SingleR", quietly = TRUE) &&
    requireNamespace("SingleCellExperiment", quietly = TRUE) &&
    requireNamespace("scrapper", quietly = TRUE)) {
    
    # create SingleCellExperiment with cell-type specific genes
    library(SingleCellExperiment)
    n_types <- 3
    n_per_type <- 30
    n_cells <- n_types * n_per_type
    n_genes <- 500
    fraction_specific <- 0.1
    n_specific <- round(n_genes * fraction_specific)

    set.seed(42)
    mu <- ceiling(runif(n = n_genes, min = 0, max = 30))
    u <- do.call(rbind, lapply(mu, function(x) rpois(n_cells, lambda = x)))
    rownames(u) <- paste0("g", seq.int(nrow(u)))
    celltype.labels <- rep(paste0("t", seq.int(n_types)), each = n_per_type)
    celltype.genes <- split(sample(rownames(u), size = n_types * n_specific),
                            rep(paste0("t", seq.int(n_types)), each = n_specific))
    for (i in seq_along(celltype.genes)) {
        j <- celltype.genes[[i]]
        k <- celltype.labels == paste0("t", i)
        u[j, k] <- 2 * u[j, k]
    }
    v <- log2(u + 1)
    sce <- SingleCellExperiment(assays=list(counts=u, logcounts=v))

    # define marker genes (subset of true cell-type-specific genes)
    marker.genes <- lapply(celltype.genes, "[", 1:5)
    marker.genes

    # predict cell types
    res <- labelCells(sce, marker.genes,
                      fraction_topscoring = 0.1,
                      normGenesetExpressionParams = list(R = 50))

    # high-scoring cells used as references for each celltype
    res$cells

    # ... from these, pseudo-bulks were created:
    res$refs

    # ... and used to predict labels for all cells
    res$labels$pruned.labels

    # compare predicted to true cell types
    table(true = celltype.labels, predicted = res$labels$pruned.labels)
}

fmicompbio/swissknife documentation built on June 11, 2025, 4:17 p.m.