classify_cells: Classify cells from trained garnett_classifier

View source: R/classify_cells.R

classify_cellsR Documentation

Classify cells from trained garnett_classifier

Description

This function uses a previously trained garnett_classifier (trained using train_cell_classifier) to classify cell types in a CDS object.

Usage

classify_cells(cds, classifier, db, cds_gene_id_type = "ENSEMBL",
  rank_prob_ratio = 1.5, cluster_extend = FALSE, verbose = FALSE,
  cluster_extend_max_frac_unknown = 0.95,
  cluster_extend_max_frac_incorrect = 0.1, return_type_levels = FALSE)

Arguments

cds

Input CDS object.

classifier

Trained garnett_classifier - output from train_cell_classifier.

db

Bioconductor AnnotationDb-class package for converting gene IDs. For example, for humans use org.Hs.eg.db. See available packages at Bioconductor. If your organism does not have an AnnotationDb-class database available, you can specify "none", however then Garnett will not check/convert gene IDs, so your CDS and marker file must have the same gene ID type.

cds_gene_id_type

The type of gene ID used in the CDS. Should be one of the values in columns(db). Default is "ENSEMBL". Ignored if db = "none".

rank_prob_ratio

Numeric value greater than 1. This is the minimum odds ratio between the probability of the most likely cell type to the second most likely cell type to allow assignment. Default is 1.5. Higher values are more conservative.

cluster_extend

Logical. When TRUE, the classifier provides a secondary cluster-extended classification, which assigns type for the entire cluster based on the assignments of the cluster members. If the pData table of the input CDS has a column called "garnett_cluster", this will be used for cluster-extended assignments. Otherwise, assignments are calculated using Louvain community detection in PCA space. This assignment is returned as a column in the output CDS pData table. For large datasets, if the "garnett_cluster" column is not provided and cluster_extend = TRUE, the function can be significantly slower the first time it is run. See details for more information.

verbose

Logical. Should progress messages be printed.

cluster_extend_max_frac_unknown

Numeric between 0 and 1. The maximum fraction of a cluster allowed to be classified as 'Unknown' and still extend classifications to the cluster. Only used when cluster_extend = TRUE. Default is 0.95. See details.

cluster_extend_max_frac_incorrect

Numeric between 0 and 1. The maximum fraction of classified cells in a cluster allowed to be incorrectly classified (i.e. assigned to a non-dominant type) and still extend classifications to the cluster. Fraction does not include 'Unknown' cells. Only used when cluster_extend = TRUE. Default is 0.1. See details.

return_type_levels

Logical. When TRUE, the function additionally appends assignments from each hierarchical level in the classifier as columns in the pData table labeled cell_type_li, where "i" indicates the corresponding level index

Details

This function applies a previously trained multinomial glmnet classifier at each node of a previously defined garnett_classifier tree. The output is a CDS object with cell type classifications added to the pData table.

When cluster_extend = TRUE, louvain communities are calculated in PCA space. Any cluster where >cluster_extend_max_frac_unknown, (default 90 >1 - cluster_extend_max_frac_unknown (default 5 be assigned that cluster-extended type. Both cluster-extended type and originally calculated cell type are reported.

Value

CDS object with classifications in the pData table.

Examples

library(org.Hs.eg.db)
data(test_classifier)
data(test_cds)

# classify cells
test_cds <- classify_cells(test_cds, test_classifier,
                           db = org.Hs.eg.db,
                           rank_prob_ratio = 1.5,
                           cluster_extend = TRUE,
                           cds_gene_id_type = "SYMBOL")


cole-trapnell-lab/garnett documentation built on Jan. 6, 2025, 2:18 p.m.