findMarkersTree: Generate marker decision tree from single-cell clustering...

Description Usage Arguments Value Examples

Description

Create a decision tree that identifies gene markers for given cell populations. The algorithm uses a decision tree procedure to generate a set of rules for each cell cluster defined by single-cell clustering. Splits are determined by one of two metrics at each split: a one-off metric to determine rules for identifying clusters by a single feature, and a balanced metric to determine rules for identifying sets of similar clusters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
findMarkersTree(x, ...)

## S4 method for signature 'SingleCellExperiment'
findMarkersTree(
  x,
  useAssay = "counts",
  altExpName = "featureSubset",
  class,
  oneoffMetric = c("modified F1", "pairwise AUC"),
  metaclusters,
  featureLabels,
  counts,
  seurat,
  threshold = 0.9,
  reuseFeatures = FALSE,
  altSplit = TRUE,
  consecutiveOneoff = FALSE,
  autoMetaclusters = TRUE,
  seed = 12345
)

## S4 method for signature 'matrix'
findMarkersTree(
  x,
  class,
  oneoffMetric = c("modified F1", "pairwise AUC"),
  metaclusters,
  featureLabels,
  counts,
  celda,
  seurat,
  threshold = 0.9,
  reuseFeatures = FALSE,
  altSplit = TRUE,
  consecutiveOneoff = FALSE,
  autoMetaclusters = TRUE,
  seed = 12345
)

Arguments

x

A numeric matrix of counts or a SingleCellExperiment with the matrix located in the assay slot under useAssay. Rows represent features and columns represent cells.

...

Ignored. Placeholder to prevent check warning.

useAssay

A string specifying which assay slot to use if x is a SingleCellExperiment object. Default "counts".

altExpName

The name for the altExp slot to use. Default "featureSubset".

class

Vector of cell cluster labels.

oneoffMetric

A character string. What one-off metric to run, either ‘modified F1' or 'pairwise AUC'. Default is ’modified F1'.

metaclusters

List where each element is a metacluster (e.g. known cell type) and all the clusters within that metacluster (e.g. subtypes).

featureLabels

Vector of feature assignments, e.g. which cluster does each gene belong to? Useful when using clusters of features (e.g. gene modules or Seurat PCs) and user wishes to expand tree results to individual features (e.g. score individual genes within marker gene modules).

counts

Numeric counts matrix. Useful when using clusters of features (e.g. gene modules) and user wishes to expand tree results to individual features (e.g. score individual genes within marker gene modules). Row names should be individual feature names. Ignored if x is a SingleCellExperiment object.

seurat

A seurat object. Note that the seurat functions RunPCA and FindClusters must have been run on the object.

threshold

Numeric between 0 and 1. The threshold for the oneoff metric. Smaller values will result in more one-off splits. Default is 0.90.

reuseFeatures

Logical. Whether or not a feature can be used more than once on the same cluster. Default is TRUE.

altSplit

Logical. Whether or not to force a marker for clusters that are solely defined by the absence of markers. Default is TRUE.

consecutiveOneoff

Logical. Whether or not to allow one-off splits at consecutive brances. Default is FALSE.

autoMetaclusters

Logical. Whether to identify metaclusters prior to creating the tree based on the distance between clusters in a UMAP dimensionality reduction projection. A metacluster is simply a large cluster that includes several clusters within it. Default is TRUE.

seed

Numeric. Seed used to enable reproducible UMAP results for identifying metaclusters. Default is 12345.

celda

A celda_CG or celda_C object. Counts matrix has to be provided as well.

Value

A named list with six elements:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
# Generate simulated single-cell dataset using celda
sim_counts <- simulateCells("celda_CG", K = 4, L = 10, G = 100)

# Celda clustering into 5 clusters & 10 modules
cm <- celda_CG(sim_counts, K = 5, L = 10, verbose = FALSE)

# Get features matrix and cluster assignments
factorized <- factorizeMatrix(cm)
features <- factorized$proportions$cell
class <- celdaClusters(cm)

# Generate Decision Tree
DecTree <- findMarkersTree(features, class)

# Plot dendrogram
plotMarkerDendro(DecTree)

## End(Not run)

celda documentation built on Nov. 8, 2020, 8:24 p.m.