Description Usage Arguments Details Value Examples
CHETAH classifies an input dataset by comparing it to a reference dataset in a stepwise, top-to-bottom fashion. See 'details' for a full explanation. NOTE: We recommend to use all the default parameters
1 2 3 4 5 6 7 8 9 | CHETAHclassifier(input, ref_cells = NULL, ref_profiles = NULL,
ref_ct = "celltypes", input_c = NA, ref_c = NA, thresh = 0.1,
gs_method = c("fc", "wilcox"), cor_method = c("spearman", "kendall",
"pearson", "cosine"), clust_method = c("average", "single", "complete",
"ward.D2", "ward.D", "mcquitty", "median", "centroid"),
clust_dist = bioDist::spearman.dist, n_genes = 200,
pc_thresh = 0.2, p_thresh = 0.05, fc_thresh = 1.5,
subsample = FALSE, fix_ngenes = TRUE, plot.tree = FALSE,
only_pos = FALSE, print_steps = FALSE)
|
input |
required: an input SingleCellExperiment.
(see: Bioconductor,
and the vignette |
ref_cells |
required: A reference SingleCellExperiment, with
the cell types in the "celltypes" colData (or otherwise defined in |
ref_profiles |
optional In case of bulk-RNA seq or micro-arrays, an expression matrix with one (average) reference expression profile per cell type in the columns. ('ref_cells' must be left empty) |
ref_ct |
the colData of |
input_c |
the name of the assay of the input to use.
|
ref_c |
same as |
thresh |
the initial confidence threshold, which can be changed after running
by |
gs_method |
method for gene selection. In every node of the tree:
"fc" = quick method: either a fixed number ( |
cor_method |
the correlation measure: one of: "spearman" (default), "kendall", "pearson", "cosine" |
clust_method |
the method used for clustering the reference profiles.
One of the methods from |
clust_dist |
a distance measure, default: |
n_genes |
The number of genes used in every step. Only used if
|
pc_thresh |
when: gs_method = "wilcox", only genes are selected
for which more than a |
p_thresh |
when: gs_method = "wilcox" , only genes are selected
that have a p-value < |
fc_thresh |
when: gs_method = "wilcox" or gs_method = "fc"
AND fix_ngenes = FALSE,
only genes are selected that have a log2 fld-change > |
subsample |
to prevent reference types with a lot of cells to influence
the gene selection, subsample types with more that |
fix_ngenes |
when: gs_method = "fc" use a fixed number of genes
for all correlations. when: gs_method = "wilcox"
use a maximum of genes per step.
When |
plot.tree |
Plot the classification tree. |
only_pos |
not recommended: only use genes for a reference type that are higher expressed in that type, than the others in that node. |
print_steps |
whether the number of genes (postive and negative) per step per ref_cell_type should be printed |
CHETAH will hierarchically cluster reference data
to produce a classification tree (ct).
In each node of the ct, CHETAH will
assign each input cell to on of the two branches, based on gene selections,
correlations and calculation of profile and confidence scores.
The assignement will only performed if the confidence score for
such an assignment is higher than the Confidence Threshold.
If this is not the case, classification for the cell will stop in the current node.
Some input cells will reach the leaf nodes of the ct (the pre-defined cell types),
these classifications are called final types
For other cells, assignment will stop in a node. These classifications
are called intermediate types.
A SingleCellExperiment with added: - input$celltype_CHETAH a named character vector that can directly be used in any other workflow/method. - "hidden" 'int_colData' and 'int_metadata', not meant for direct interaction, but which can all be viewed and interacted with using: 'PlotCHETAH' and 'CHETAHshiny' A list containing the following objects is added to input$int_metadata$CHETAH
classification a named vector: the classified types with the corresponding names of the input cells
tree the hclust object of the classification tree
nodetypes A list with the cell types under each node
nodecoor the coordinates of the nodes of the classification tree
genes A list per node, containing a list per reference type with the genes used for the profile scores of that type
parameters The parameters used
A nested DataFrame is added to input$int_colData$CHETAH. It holds 3 top-levels DataFrames
prof_scores A list with the profile scores
conf_scores A list with the confidence scores
correlations A list with the correlations of the input cells to the reference profiles
1 2 3 4 5 | ## Melanoma data from Tirosh et al. (2016) Science
input_mel
## Head-Neck data from Puram et al. (2017) Cancer Cell
headneck_ref
input_mel <- CHETAHclassifier(input = input_mel, ref_cells = headneck_ref)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.