View source: R/Celltype_Calculate_PerCell.R
| Celltype_Calculate_PerCell | R Documentation |
Unlike cluster-based annotation, this function assigns cell type labels to each individual cell based on marker gene expression profiles. Optionally uses UMAP coordinates to smooth predictions via k-nearest neighbor voting.
Celltype_Calculate_PerCell(
seurat_obj,
gene_list,
species,
assay = "RNA",
method = c("weighted", "mean", "AUCell"),
min_expression = 0.1,
use_umap_smoothing = FALSE,
umap_reduction = "umap",
k_neighbors = 15,
smoothing_weight = 0.3,
min_score = "auto",
min_confidence = 1.2,
return_scores = FALSE,
ncores = 1,
chunk_size = 5000,
verbose = TRUE
)
seurat_obj |
Seurat object with normalized expression data. |
gene_list |
A standardized marker list (same format as Celltype_Calculate). |
species |
"Human" or "Mouse" for gene name formatting. |
assay |
Assay to use (default: "RNA"). |
method |
Scoring method: "AUCell" (rank-based), "mean" (average expression), or "weighted" (expression * detection weighted). Default: "weighted". |
min_expression |
Minimum expression threshold for detection. Default: 0.1. |
use_umap_smoothing |
Logical. If TRUE, apply k-NN smoothing using UMAP coordinates to improve annotation consistency. Default: FALSE. |
umap_reduction |
Name of UMAP reduction in Seurat object. Default: "umap". |
k_neighbors |
Number of neighbors for UMAP smoothing. Default: 15. |
smoothing_weight |
Weight for neighbor votes vs cell's own score (0-1). Higher values give more weight to neighbors. Default: 0.3. |
min_score |
Minimum score threshold to assign a cell type. Cells below this threshold are labeled "Unassigned". Default: "auto" which adaptively sets the threshold based on number of cell types (1.5 / n_celltypes). Set to a numeric value (e.g., 0.1) to use a fixed threshold. |
min_confidence |
Minimum confidence threshold. Cells with confidence below this value are labeled "Unassigned". Confidence is calculated as the ratio of max score to second-highest score. Default: 1.2 (max must be 20% higher than second). Set to 1.0 to disable confidence filtering. |
return_scores |
If TRUE, return full score matrix. Default: FALSE. |
ncores |
Number of cores for parallel processing. Default: 1. |
chunk_size |
Number of cells to process per chunk (memory optimization). Default: 5000. |
verbose |
Print progress messages. Default: TRUE. |
"weighted" (recommended): Combines normalized expression with detection rate. For each cell and cell type: score = mean(expr_i * weight_i) where weight_i is derived from the marker's specificity across the dataset.
"mean": Simple average of normalized marker expression. Fast but less discriminative for overlapping marker sets.
"AUCell": Rank-based scoring similar to AUCell package. For each cell, genes are ranked by expression, and the score is the proportion of marker genes in the top X% of expressed genes. Robust to technical variation.
When use_umap_smoothing = TRUE, the function:
Computes initial per-cell scores
Finds k nearest neighbors in UMAP space for each cell
Smooths scores by weighted averaging with neighbors
Re-assigns cell types based on smoothed scores
This helps reduce noise and improve consistency of annotations within spatially coherent regions.
A list containing:
Cell_annotations: Data frame with Cell_barcode, Predicted_cell_type, Max_score, Confidence
Cell_confidence: Numeric vector of confidence scores per cell
Summary: Summary table of cell type counts and percentages
Expression_list: List of mean expression matrices per cell type (for verification)
Proportion_list: List of detection proportion matrices per cell type
Prediction_results: Summary data frame with per-cell-type statistics
Probability_matrix: Full cell × cell_type probability matrix (normalized)
Raw_score_matrix: Full cell × cell_type raw score matrix (before normalization)
Parameters: List of parameters used including adaptive thresholds
Cell_scores: (if return_scores=TRUE) Same as Probability_matrix
Other Section_3_Automated_Annotation:
Celltype_Annotation(),
Celltype_Annotation_PerCell(),
Celltype_Calculate(),
Celltype_Verification(),
Celltype_Verification_PerCell(),
Parameter_Calculate(),
percell_workflow
## Not run:
# Basic per-cell annotation
result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "weighted"
)
# Add annotations to Seurat object
sce$Cell_type_PerCell <- result$Cell_annotations$Predicted_cell_type
# With UMAP smoothing for more consistent annotations
result_smooth <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
use_umap_smoothing = TRUE,
k_neighbors = 20,
smoothing_weight = 0.3
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.