findMarkersTree: Generate marker decision tree from single-cell clustering...
In celda: CEllular Latent Dirichlet Allocation

Description Usage Arguments Value Examples

Create a decision tree that identifies gene markers for given cell populations. The algorithm uses a decision tree procedure to generate a set of rules for each cell cluster defined by single-cell clustering. Splits are determined by one of two metrics at each split: a one-off metric to determine rules for identifying clusters by a single feature, and a balanced metric to determine rules for identifying sets of similar clusters.

findMarkersTree(x, ...)

## S4 method for signature 'SingleCellExperiment'
findMarkersTree(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  class,
  oneoffMetric = c("modified F1", "pairwise AUC"),
  metaclusters,
  featureLabels,
  counts,
  seurat,
  threshold = 0.9,
  reuseFeatures = FALSE,
  altSplit = TRUE,
  consecutiveOneoff = FALSE,
  autoMetaclusters = TRUE,
  seed = 12345
)

## S4 method for signature 'matrix'
findMarkersTree(
  x,
  class,
  oneoffMetric = c("modified F1", "pairwise AUC"),
  metaclusters,
  featureLabels,
  counts,
  celda,
  seurat,
  threshold = 0.9,
  reuseFeatures = FALSE,
  altSplit = TRUE,
  consecutiveOneoff = FALSE,
  autoMetaclusters = TRUE,
  seed = 12345
)

`x`	A numeric matrix of counts or a SingleCellExperiment with the matrix located in the assay slot under `useAssay`. Rows represent features and columns represent cells.
`...`	Ignored. Placeholder to prevent check warning.
`useAssay`	A string specifying which assay slot to use if `x` is a SingleCellExperiment object. Default "counts".
`altExpName`	The name for the altExp slot to use. Default "featureSubset".
`class`	Vector of cell cluster labels.
`oneoffMetric`	A character string. What one-off metric to run, either ‘modified F1' or 'pairwise AUC'. Default is ’modified F1'.
`metaclusters`	List where each element is a metacluster (e.g. known cell type) and all the clusters within that metacluster (e.g. subtypes).
`featureLabels`	Vector of feature assignments, e.g. which cluster does each gene belong to? Useful when using clusters of features (e.g. gene modules or Seurat PCs) and user wishes to expand tree results to individual features (e.g. score individual genes within marker gene modules).
`counts`	Numeric counts matrix. Useful when using clusters of features (e.g. gene modules) and user wishes to expand tree results to individual features (e.g. score individual genes within marker gene modules). Row names should be individual feature names. Ignored if `x` is a SingleCellExperiment object.
`seurat`	A seurat object. Note that the seurat functions RunPCA and FindClusters must have been run on the object.
`threshold`	Numeric between 0 and 1. The threshold for the oneoff metric. Smaller values will result in more one-off splits. Default is 0.90.
`reuseFeatures`	Logical. Whether or not a feature can be used more than once on the same cluster. Default is TRUE.
`altSplit`	Logical. Whether or not to force a marker for clusters that are solely defined by the absence of markers. Default is TRUE.
`consecutiveOneoff`	Logical. Whether or not to allow one-off splits at consecutive brances. Default is FALSE.
`autoMetaclusters`	Logical. Whether to identify metaclusters prior to creating the tree based on the distance between clusters in a UMAP dimensionality reduction projection. A metacluster is simply a large cluster that includes several clusters within it. Default is TRUE.
`seed`	Numeric. Seed used to enable reproducible UMAP results for identifying metaclusters. Default is 12345.
`celda`	A celda_CG or celda_C object. Counts matrix has to be provided as well.

A named list with six elements:

rules - A named list with one data frame for every label. Each data frame has five columns and gives the set of rules for disinguishing each label.
- feature - Marker feature, e.g. marker gene name.
- direction - Relationship to feature value. -1 if cluster is down-regulated for this feature, 1 if cluster is up-regulated.
- stat - The performance value returned by the splitting metric for this split.
- statUsed - Which performance metric was used. "Split" if information gain and "One-off" if one-off.
- level - The level of the tree at which is rule was defined. 1 is the level of the first split of the tree.
- metacluster - Optional. If metaclusters were used, the metacluster this rule is applied to.
dendro - A dendrogram object of the decision tree output. Plot with plotMarkerDendro()
classLabels - A vector of the class labels used in the model, i.e. cell cluster labels.
metaclusterLabels - A vector of the metacluster labels used in the model
prediction - A character vector of label of predictions of the training data using the final model. "MISSING" if label prediction was ambiguous.
performance - A named list denoting the training performance of the model:
- accuracy - (number correct/number of samples) for the whole set of samples.
- balAcc - mean sensitivity across all clusters
- meanPrecision - mean precision across all clusters
- correct - the number of correct predictions of each cluster
- sizes - the number of actual counts of each cluster
- sensitivity - the sensitivity of the prediciton of each cluster
- precision - the precision of the prediciton of each cluster

## Not run: 
# Generate simulated single-cell dataset using celda
sim_counts <- simulateCells("celda_CG", K = 4, L = 10, G = 100)

# Celda clustering into 5 clusters & 10 modules
cm <- celda_CG(sim_counts, K = 5, L = 10, verbose = FALSE)

# Get features matrix and cluster assignments
factorized <- factorizeMatrix(cm)
features <- factorized$proportions$cell
class <- celdaClusters(cm)

# Generate Decision Tree
DecTree <- findMarkersTree(features, class)

# Plot dendrogram
plotMarkerDendro(DecTree)

## End(Not run)