check_markers: Check marker file
In cole-trapnell-lab/garnett: Automated cell type classification

View source: R/utils.R

check_markers

R Documentation

Check marker file

Description

Check the markers chosen for the marker file and generate a table of useful statistics. The output of this function can be fed into plot_markers to generate a diagnostic plot.

Usage

check_markers(cds, marker_file, db, cds_gene_id_type = "SYMBOL",
  marker_file_gene_id_type = "SYMBOL", propogate_markers = TRUE,
  use_tf_idf = TRUE, classifier_gene_id_type = "ENSEMBL")

Arguments

`cds`	Input CDS object.
`marker_file`	A character path to the marker file to define cell types. See details and documentation for `Parser` by running `?Parser` for more information.
`db`	Bioconductor AnnotationDb-class package for converting gene IDs. For example, for humans use org.Hs.eg.db. See available packages at Bioconductor. If your organism does not have an AnnotationDb-class database available, you can specify "none", however then Garnett will not check/convert gene IDs, so your CDS and marker file must have the same gene ID type.
`cds_gene_id_type`	The type of gene ID used in the CDS. Should be one of the values in `columns(db)`. Default is "ENSEMBL". Ignored if db = "none".
`marker_file_gene_id_type`	The type of gene ID used in the marker file. Should be one of the values in `columns(db)`. Default is "SYMBOL". Ignored if db = "none".
`propogate_markers`	Logical. Should markers from child nodes of a cell type be used in finding representatives of the parent type? Should generally be `TRUE`.
`use_tf_idf`	Logical. Should TF-IDF matrix be calculated during estimation? If `TRUE`, estimates will be more accurate, but calculation is slower with very large datasets.
`classifier_gene_id_type`	The type of gene ID that will be used in the classifier. If possible for your organism, this should be "ENSEMBL", which is the default. Ignored if db = "none".

Details

This function checks the chosen cell type markers in the marker file provided to ensure they are good candidates for use in classification. The function works by estimating which cells will be chosen given each marker gene and returning some statistics for each marker. Note that this function does not take into account meta data information when calculating statistics.

The output data.frame has several columns:

marker_gene: Gene name as provided in the marker file
ENSEMBL: The corresponding ensembl ID derived from db conversion
parent: The parent cell type in the cell type hierarchy - 'root' if top level
cell_type: The cell type the marker belongs to
in_cds: Whether the marker is present in the CDS
nominates: The number of cells the marker is estimated to nominate to the cell type
total_nominated: The total number of cells nominated by all the markers for that cell type
exclusion_dismisses: The number of cells no longer nominated to the cell type if this marker is excluded (i.e. not captured by other markers for the cell type)
inclusion_ambiguates: How many cells become ambiguous (i.e. are nominated to multiple cell types) if this marker is included
most_overlap: The cell type that most often shares this marker (i.e. is the other side of the ambiguity). If inclusion_ambiguates is 0, most_overlap is NA
ambiguity: inclusion_ambiguates/nominates - if high, consider excluding this marker
marker_score: (1/(ambiguity + .01)) * nominates/total_nominated - a general measure of the quality of a marker. Higher is better
summary: A summary column that identifies potential problems with the provided markers

Value

Data.frame of marker check results.

Examples

library(org.Hs.eg.db)
data(test_cds)

# generate size factors for normalization later
test_cds <- estimateSizeFactors(test_cds)
marker_file_path <- system.file("extdata", "pbmc_bad_markers.txt",
                                package = "garnett")
marker_check <- check_markers(test_cds, marker_file_path,
                              db=org.Hs.eg.db,
                              cds_gene_id_type = "SYMBOL",
                              marker_file_gene_id_type = "SYMBOL")

cole-trapnell-lab/garnett documentation built on Jan. 6, 2025, 2:18 p.m.