Celltype_Calculate_PerCell: Per-cell annotation using marker expression and optional UMAP...

View source: R/Celltype_Calculate_PerCell.R

Celltype_Calculate_PerCellR Documentation

Per-cell annotation using marker expression and optional UMAP spatial smoothing

Description

Unlike cluster-based annotation, this function assigns cell type labels to each individual cell based on marker gene expression profiles. Optionally uses UMAP coordinates to smooth predictions via k-nearest neighbor voting.

Usage

Celltype_Calculate_PerCell(
  seurat_obj,
  gene_list,
  species,
  assay = "RNA",
  method = c("weighted", "mean", "AUCell"),
  min_expression = 0.1,
  use_umap_smoothing = FALSE,
  umap_reduction = "umap",
  k_neighbors = 15,
  smoothing_weight = 0.3,
  min_score = "auto",
  min_confidence = 1.2,
  return_scores = FALSE,
  ncores = 1,
  chunk_size = 5000,
  verbose = TRUE
)

Arguments

seurat_obj

Seurat object with normalized expression data.

gene_list

A standardized marker list (same format as Celltype_Calculate).

species

"Human" or "Mouse" for gene name formatting.

assay

Assay to use (default: "RNA").

method

Scoring method: "AUCell" (rank-based), "mean" (average expression), or "weighted" (expression * detection weighted). Default: "weighted".

min_expression

Minimum expression threshold for detection. Default: 0.1.

use_umap_smoothing

Logical. If TRUE, apply k-NN smoothing using UMAP coordinates to improve annotation consistency. Default: FALSE.

umap_reduction

Name of UMAP reduction in Seurat object. Default: "umap".

k_neighbors

Number of neighbors for UMAP smoothing. Default: 15.

smoothing_weight

Weight for neighbor votes vs cell's own score (0-1). Higher values give more weight to neighbors. Default: 0.3.

min_score

Minimum score threshold to assign a cell type. Cells below this threshold are labeled "Unassigned". Default: "auto" which adaptively sets the threshold based on number of cell types (1.5 / n_celltypes). Set to a numeric value (e.g., 0.1) to use a fixed threshold.

min_confidence

Minimum confidence threshold. Cells with confidence below this value are labeled "Unassigned". Confidence is calculated as the ratio of max score to second-highest score. Default: 1.2 (max must be 20% higher than second). Set to 1.0 to disable confidence filtering.

return_scores

If TRUE, return full score matrix. Default: FALSE.

ncores

Number of cores for parallel processing. Default: 1.

chunk_size

Number of cells to process per chunk (memory optimization). Default: 5000.

verbose

Print progress messages. Default: TRUE.

Details

Scoring Methods

"weighted" (recommended): Combines normalized expression with detection rate. For each cell and cell type: score = mean(expr_i * weight_i) where weight_i is derived from the marker's specificity across the dataset.

"mean": Simple average of normalized marker expression. Fast but less discriminative for overlapping marker sets.

"AUCell": Rank-based scoring similar to AUCell package. For each cell, genes are ranked by expression, and the score is the proportion of marker genes in the top X% of expressed genes. Robust to technical variation.

UMAP Smoothing

When use_umap_smoothing = TRUE, the function:

  1. Computes initial per-cell scores

  2. Finds k nearest neighbors in UMAP space for each cell

  3. Smooths scores by weighted averaging with neighbors

  4. Re-assigns cell types based on smoothed scores

This helps reduce noise and improve consistency of annotations within spatially coherent regions.

Value

A list containing:

  • Cell_annotations: Data frame with Cell_barcode, Predicted_cell_type, Max_score, Confidence

  • Cell_confidence: Numeric vector of confidence scores per cell

  • Summary: Summary table of cell type counts and percentages

  • Expression_list: List of mean expression matrices per cell type (for verification)

  • Proportion_list: List of detection proportion matrices per cell type

  • Prediction_results: Summary data frame with per-cell-type statistics

  • Probability_matrix: Full cell × cell_type probability matrix (normalized)

  • Raw_score_matrix: Full cell × cell_type raw score matrix (before normalization)

  • Parameters: List of parameters used including adaptive thresholds

  • Cell_scores: (if return_scores=TRUE) Same as Probability_matrix

See Also

Other Section_3_Automated_Annotation: Celltype_Annotation(), Celltype_Annotation_PerCell(), Celltype_Calculate(), Celltype_Verification(), Celltype_Verification_PerCell(), Parameter_Calculate(), percell_workflow

Examples

## Not run: 
# Basic per-cell annotation
result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "weighted"
)

# Add annotations to Seurat object
sce$Cell_type_PerCell <- result$Cell_annotations$Predicted_cell_type

# With UMAP smoothing for more consistent annotations
result_smooth <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    use_umap_smoothing = TRUE,
    k_neighbors = 20,
    smoothing_weight = 0.3
)

## End(Not run)


SlimR documentation built on Feb. 5, 2026, 5:08 p.m.