newCellTypeHierarchy: Classify cells according to a set of markers

Description Usage Arguments Details Value Functions Examples

View source: R/methods-CellTypeHierarchy.R

Description

Creates a CellTypeHierarchy object which can store cell types with the addCellType() function. When classifyCells is used with a CellDataSet and a CellTypeHierarchy cells in the CellDataSet can be classified as cell types found in the CellTypeHierarchy

classifyCells accepts a cellDataSet and and a cellTypeHierarchy. Each cell in the cellDataSet is checked against the functions in the cellTypeHierarchy to determine each cell's type

Usage

1
2
3
4
5
6
7
newCellTypeHierarchy()

classifyCells(cds, cth, frequency_thresh = NULL, enrichment_thresh = NULL,
  ...)

calculateMarkerSpecificity(cds, cth, remove_ambig = TRUE,
  remove_unknown = TRUE)

Arguments

cds

The CelllDataSet you want to classify

cth

CellTypeHierarchy

frequency_thresh

If at least this fraction of group of cells meet a cell types marker criteria, impute them all to be of that type.

enrichment_thresh

fraction to be multipled by each cell type percentage. Only used if frequency_thresh is NULL, both cannot be NULL

...

character strings that you wish to pass to dplyr's group_by_ routine

remove_ambig

a boolean that determines if ambiguous cells should be removed

remove_unknown

a boolean that determines whether unknown cells should be removed

Details

CellTypeHierarchy objects are Monocle's mechanism for classifying cells into types based on known markers. To classify the cells in a CellDataSet object according to known markers, first construct a CellTypeHierachy with newCellTypeHierarchy() and addCellType() and then provide both the CellDataSet and the CellTypeHierachy to classifyCells(). Each call to addCellType() registers a classification function that accepts the expression data from a CellDataSet object as input, and returns a boolean vector indicating whether each cell is of the given type. When you call classifyCells(), each cell will be checked against the classification functions in the CellTypeHierachy. If you wish to make a cell type a subtype of another that's already been registered with a CellTypeHierarchy object, make that one the "parent" type with the cell_type_name argument. If you want two types to be mutually exclusive, make them "siblings" by giving them the same parent. The classifcation functions in a CellTypeHierarchy must take a single argument, a matrix of expression values, as input. Note that this matrix could either be a sparseMatrix or a dense matrix. Explicitly casting the input to a dense matrix inside a classification function is likely to drastically slow down classifyCells and other routines that use CellTypeHierarhcy objects. Successive calls to addCellType build up a tree of classification functions inside a CellTypeHierarchy. When two functions are siblings in the tree, classifyCells expects that a cell will meet the classification criteria for at most one of them. For example, you might place classification functions for T cells and B cells as siblings, because a cell cannot be both of these at the same time. When a cell meets the criteria for more than one function, it will be tagged as "Ambiguous". If classifyCells reports a large number of ambiguous cells, consider adjusting your classification functions. For example, some cells are defined by very high expression of a key gene that is expressed at lower levels in other cell types. Raising the threshold for this gene in a classification could resolve the ambiguities. A classification function can also have child functions. You can use this to specify subtypes of cells. For example, T cells express the gene CD3, and there are many subtypes. You can encode each subset by first adding a general T cell classification function that recognizes CD3, and then adding an additional function that recognizes CD4 (for CD4+ helper T cells), one for CD8 (to identify CD8+ cytotoxic T cells), and so on. classifyCells will aim to assign each cell to its most specific subtype in the "CellType" column. By default, classifyCells applies the classification functions to individual cells, but you can also apply it to cells in a "grouped" mode to impute the type of cells that are missing expression of your known markers. You can specify additional (quoted) grouping variables to classifyCells. The function will group the cells according to these factors, and then classify the cells. It will compute the frequency of each cell type in each group, and if a cell type is present at the frquency specified in frequency_thresh, all the cells in the group are classified as that type. If group contains more one cell type at this frequency, all the cells are marked "Ambiguous". This allows you to impute cell type based on unsupervised clustering results (e.g. with clusterCells()) or some other grouping criteria.

Value

newCellTypeHierarchy and addCellType both return an updated CellTypeHierarchy object. classifyCells returns an updated CellDataSet with a new column, "CellType", in the pData table.

For a CellDataset with N genes, and a CellTypeHierarchy with k types, returns a dataframe with N x k rows. Each row contains a gene and a specifity score for one of the types.

Functions

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Not run: 
# Initialize a new CellTypeHierachy

# Register a set of classification functions. There are multiple types of T cells
# A cell cannot be both a B cell and a T cell, a T cell and a Monocyte, or
# a B cell and a Monocyte.
cth <- newCellTypeHierarchy()

cth <- addCellType(cth, "T cell", 
                   classify_func=function(x) {x["CD3D",] > 0})
                   
cth <- addCellType(cth, "CD4+ T cell", 
                   classify_func=function(x) {x["CD4",] > 0}, 
                   parent_cell_type_name = "T cell")
                   
cth <- addCellType(cth, "CD8+ T cell", 
                   classify_func=function(x) {
                     x["CD8A",] > 0 | x["CD8B",] > 0
                   }, 
                   parent_cell_type_name = "T cell")
                   
cth <- addCellType(cth, "B cell", 
                   classify_func=function(x) {x["MS4A1",] > 0})
                   
cth <- addCellType(cth, "Monocyte", 
                   classify_func=function(x) {x["CD14",] > 0})

# Classify each cell in the CellDataSet "mix" according to these types
mix <- classifyCells(mix, cth)

# Group the cells by the pData table column "Cluster". Apply the classification
functions to the cells groupwise. If a group is at least 5% of a type, make
them all that type. If the group is 5% one type, and 5% a different, mutually
exclusive type, mark the whole cluster "Ambiguous"
mix <- classifyCells(mix, Cluster, 0.05)

## End(Not run)

cole-trapnell-lab/monocle-release documentation built on May 13, 2019, 8:50 p.m.