check_markers | R Documentation |
Check the markers chosen for the marker file and generate a table of useful
statistics. The output of this function can be fed into
plot_markers
to generate a diagnostic plot.
check_markers(cds, marker_file, db, cds_gene_id_type = "SYMBOL",
marker_file_gene_id_type = "SYMBOL", propogate_markers = TRUE,
use_tf_idf = TRUE, classifier_gene_id_type = "ENSEMBL")
cds |
Input CDS object. |
marker_file |
A character path to the marker file to define cell types.
See details and documentation for |
db |
Bioconductor AnnotationDb-class package for converting gene IDs. For example, for humans use org.Hs.eg.db. See available packages at Bioconductor. If your organism does not have an AnnotationDb-class database available, you can specify "none", however then Garnett will not check/convert gene IDs, so your CDS and marker file must have the same gene ID type. |
cds_gene_id_type |
The type of gene ID used in the CDS. Should be one
of the values in |
marker_file_gene_id_type |
The type of gene ID used in the marker file.
Should be one of the values in |
propogate_markers |
Logical. Should markers from child nodes of a cell
type be used in finding representatives of the parent type? Should
generally be |
use_tf_idf |
Logical. Should TF-IDF matrix be calculated during
estimation? If |
classifier_gene_id_type |
The type of gene ID that will be used in the classifier. If possible for your organism, this should be "ENSEMBL", which is the default. Ignored if db = "none". |
This function checks the chosen cell type markers in the marker file provided to ensure they are good candidates for use in classification. The function works by estimating which cells will be chosen given each marker gene and returning some statistics for each marker. Note that this function does not take into account meta data information when calculating statistics.
The output data.frame has several columns:
Gene name as provided in the marker file
The corresponding ensembl ID derived from db conversion
The parent cell type in the cell type hierarchy - 'root' if top level
The cell type the marker belongs to
Whether the marker is present in the CDS
The number of cells the marker is estimated to nominate to the cell type
The total number of cells nominated by all the markers for that cell type
The number of cells no longer nominated to the cell type if this marker is excluded (i.e. not captured by other markers for the cell type)
How many cells become ambiguous (i.e. are nominated to multiple cell types) if this marker is included
The cell type that most often shares this marker (i.e. is the other side of the ambiguity). If inclusion_ambiguates is 0, most_overlap is NA
inclusion_ambiguates/nominates - if high, consider excluding this marker
(1/(ambiguity + .01)) * nominates/total_nominated - a general measure of the quality of a marker. Higher is better
A summary column that identifies potential problems with the provided markers
Data.frame of marker check results.
library(org.Hs.eg.db)
data(test_cds)
# generate size factors for normalization later
test_cds <- estimateSizeFactors(test_cds)
marker_file_path <- system.file("extdata", "pbmc_bad_markers.txt",
package = "garnett")
marker_check <- check_markers(test_cds, marker_file_path,
db=org.Hs.eg.db,
cds_gene_id_type = "SYMBOL",
marker_file_gene_id_type = "SYMBOL")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.