Celltype_Calculate_PerCell: Per-cell annotation using marker expression and optional UMAP...
In SlimR: Adaptive Machine Learning-Powered, Context-Matching Tool for Single-Cell and Spatial Transcriptomics Annotation

View source: R/Celltype_Calculate_PerCell.R

Celltype_Calculate_PerCell

R Documentation

Per-cell annotation using marker expression and optional UMAP spatial smoothing

Description

Unlike cluster-based annotation, this function assigns cell type labels to each individual cell based on marker gene expression profiles. Optionally uses UMAP coordinates to smooth predictions via k-nearest neighbor voting.

Usage

Celltype_Calculate_PerCell(
  seurat_obj,
  gene_list,
  species,
  assay = "RNA",
  method = c("weighted", "mean", "AUCell"),
  min_expression = 0.1,
  use_umap_smoothing = FALSE,
  umap_reduction = "umap",
  k_neighbors = 15,
  smoothing_weight = 0.3,
  min_score = "auto",
  min_confidence = 1.2,
  return_scores = FALSE,
  ncores = 1,
  chunk_size = 5000,
  verbose = TRUE
)

Arguments

`seurat_obj`	Seurat object with normalized expression data.
`gene_list`	A standardized marker list (same format as Celltype_Calculate).
`species`	"Human" or "Mouse" for gene name formatting.
`assay`	Assay to use (default: "RNA").
`method`	Scoring method: "AUCell" (rank-based), "mean" (average expression), or "weighted" (expression * detection weighted). Default: "weighted".
`min_expression`	Minimum expression threshold for detection. Default: 0.1.
`use_umap_smoothing`	Logical. If TRUE, apply k-NN smoothing using UMAP coordinates to improve annotation consistency. Default: FALSE.
`umap_reduction`	Name of UMAP reduction in Seurat object. Default: "umap".
`k_neighbors`	Number of neighbors for UMAP smoothing. Default: 15.
`smoothing_weight`	Weight for neighbor votes vs cell's own score (0-1). Higher values give more weight to neighbors. Default: 0.3.
`min_score`	Minimum score threshold to assign a cell type. Cells below this threshold are labeled "Unassigned". Default: "auto" which adaptively sets the threshold based on number of cell types (1.5 / n_celltypes). Set to a numeric value (e.g., 0.1) to use a fixed threshold.
`min_confidence`	Minimum confidence threshold. Cells with confidence below this value are labeled "Unassigned". Confidence is calculated as the ratio of max score to second-highest score. Default: 1.2 (max must be 20% higher than second). Set to 1.0 to disable confidence filtering.
`return_scores`	If TRUE, return full score matrix. Default: FALSE.
`ncores`	Number of cores for parallel processing. Default: 1.
`chunk_size`	Number of cells to process per chunk (memory optimization). Default: 5000.
`verbose`	Print progress messages. Default: TRUE.

Details

Scoring Methods

"weighted" (recommended): Combines normalized expression with detection rate. For each cell and cell type: score = mean(expr_i * weight_i) where weight_i is derived from the marker's specificity across the dataset.

"mean": Simple average of normalized marker expression. Fast but less discriminative for overlapping marker sets.

"AUCell": Rank-based scoring similar to AUCell package. For each cell, genes are ranked by expression, and the score is the proportion of marker genes in the top X% of expressed genes. Robust to technical variation.

UMAP Smoothing

When use_umap_smoothing = TRUE, the function:

Computes initial per-cell scores
Finds k nearest neighbors in UMAP space for each cell
Smooths scores by weighted averaging with neighbors
Re-assigns cell types based on smoothed scores

This helps reduce noise and improve consistency of annotations within spatially coherent regions.

Value

A list containing:

Cell_annotations: Data frame with Cell_barcode, Predicted_cell_type, Max_score, Confidence
Cell_confidence: Numeric vector of confidence scores per cell
Summary: Summary table of cell type counts and percentages
Expression_list: List of mean expression matrices per cell type (for verification)
Proportion_list: List of detection proportion matrices per cell type
Prediction_results: Summary data frame with per-cell-type statistics
Probability_matrix: Full cell × cell_type probability matrix (normalized)
Raw_score_matrix: Full cell × cell_type raw score matrix (before normalization)
Parameters: List of parameters used including adaptive thresholds
Cell_scores: (if return_scores=TRUE) Same as Probability_matrix

Examples

## Not run: 
# Basic per-cell annotation
result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "weighted"
)

# Add annotations to Seurat object
sce$Cell_type_PerCell <- result$Cell_annotations$Predicted_cell_type

# With UMAP smoothing for more consistent annotations
result_smooth <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    use_umap_smoothing = TRUE,
    k_neighbors = 20,
    smoothing_weight = 0.3
)

## End(Not run)

SlimR documentation built on March 13, 2026, 5:08 p.m.