Description Usage Arguments Details Value Functions Examples
View source: R/methods-CellTypeHierarchy.R
Creates a CellTypeHierarchy object which can store cell types with the addCellType() function. When classifyCells is used with a CellDataSet and a CellTypeHierarchy cells in the CellDataSet can be classified as cell types found in the CellTypeHierarchy
classifyCells accepts a cellDataSet and and a cellTypeHierarchy. Each cell in the cellDataSet is checked against the functions in the cellTypeHierarchy to determine each cell's type
1 2 3 4 5 6 7 | newCellTypeHierarchy()
classifyCells(cds, cth, frequency_thresh = NULL, enrichment_thresh = NULL,
...)
calculateMarkerSpecificity(cds, cth, remove_ambig = TRUE,
remove_unknown = TRUE)
|
cds |
The CelllDataSet you want to classify |
cth |
CellTypeHierarchy |
frequency_thresh |
If at least this fraction of group of cells meet a cell types marker criteria, impute them all to be of that type. |
enrichment_thresh |
fraction to be multipled by each cell type percentage. Only used if frequency_thresh is NULL, both cannot be NULL |
... |
character strings that you wish to pass to dplyr's group_by_ routine |
remove_ambig |
a boolean that determines if ambiguous cells should be removed |
remove_unknown |
a boolean that determines whether unknown cells should be removed |
CellTypeHierarchy objects are Monocle's mechanism for
classifying cells into types based on known markers. To classify the cells
in a CellDataSet object according to known markers, first construct a
CellTypeHierachy with newCellTypeHierarchy()
and
addCellType()
and then provide both the CellDataSet
and the CellTypeHierachy
to classifyCells()
. Each
call to addCellType()
registers a classification function
that accepts the expression data from a CellDataSet object as input, and
returns a boolean vector indicating whether each cell is of the given type.
When you call classifyCells()
, each cell will be checked against the classification functions in the
CellTypeHierachy
. If you wish to make a cell type a subtype of
another that's already been registered with a CellTypeHierarchy object,
make that one the "parent" type with the cell_type_name
argument. If
you want two types to be mutually exclusive, make them "siblings" by giving
them the same parent. The classifcation functions in a CellTypeHierarchy must take a single argument, a matrix of
expression values, as input. Note that this matrix could either be a
sparseMatrix
or a dense matrix. Explicitly casting the input to a dense
matrix inside a classification function is likely to drastically slow down
classifyCells and other routines that use CellTypeHierarhcy objects.
Successive calls to addCellType
build up a tree of classification
functions inside a CellTypeHierarchy. When two functions are siblings in
the tree, classifyCells expects that a cell will meet the classification
criteria for at most one of them. For example, you might place
classification functions for T cells and B cells as siblings, because
a cell cannot be both of these at the same time. When a cell meets the
criteria for more than one function, it will be tagged as "Ambiguous". If
classifyCells
reports a large number of ambiguous cells, consider
adjusting your classification functions. For example, some cells are
defined by very high expression of a key gene that is expressed at lower
levels in other cell types. Raising the threshold for this gene in a
classification could resolve the ambiguities. A classification function
can also have child functions. You can use this to specify subtypes of
cells. For example, T cells express the gene CD3, and there are many
subtypes. You can encode each subset by first adding a general T cell
classification function that recognizes CD3, and then adding an additional
function that recognizes CD4 (for CD4+ helper T cells), one for CD8 (to
identify CD8+ cytotoxic T cells), and so on. classifyCells
will
aim to assign each cell to its most specific subtype in the "CellType"
column. By default, classifyCells
applies the classification functions to
individual cells, but you can also apply it to cells in a "grouped" mode to
impute the type of cells that are missing expression of your known markers.
You can specify additional (quoted) grouping variables to classifyCells
.
The function will group the cells according to these factors, and then
classify the cells. It will compute the frequency of each cell type in each
group, and if a cell type is present at the frquency specified in
frequency_thresh
, all the cells in the group are classified as that
type. If group contains more one cell type at this frequency, all the cells
are marked "Ambiguous". This allows you to impute cell type based on
unsupervised clustering results (e.g. with clusterCells()
) or
some other grouping criteria.
newCellTypeHierarchy
and addCellType
both return an
updated CellTypeHierarchy object. classifyCells
returns an updated
CellDataSet
with a new column, "CellType", in the pData table.
For a CellDataset with N genes, and a CellTypeHierarchy with k types, returns a dataframe with N x k rows. Each row contains a gene and a specifity score for one of the types.
classifyCells
: Add a cell type to a CellTypeHierarchy
calculateMarkerSpecificity
: Calculate each gene's specificity for each cell type
Computes the Jensen-Shannon distance between the distribution of a gene's expression across cells and a hypothetical gene that is perfectly restricted to each cell type. The Jensen-Shannon distance is an information theoretic metric between two probability distributions. It is a widely accepted measure of cell-type specificity. For a complete description see Cabili et. al, Genes & Development (2011).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ## Not run:
# Initialize a new CellTypeHierachy
# Register a set of classification functions. There are multiple types of T cells
# A cell cannot be both a B cell and a T cell, a T cell and a Monocyte, or
# a B cell and a Monocyte.
cth <- newCellTypeHierarchy()
cth <- addCellType(cth, "T cell",
classify_func=function(x) {x["CD3D",] > 0})
cth <- addCellType(cth, "CD4+ T cell",
classify_func=function(x) {x["CD4",] > 0},
parent_cell_type_name = "T cell")
cth <- addCellType(cth, "CD8+ T cell",
classify_func=function(x) {
x["CD8A",] > 0 | x["CD8B",] > 0
},
parent_cell_type_name = "T cell")
cth <- addCellType(cth, "B cell",
classify_func=function(x) {x["MS4A1",] > 0})
cth <- addCellType(cth, "Monocyte",
classify_func=function(x) {x["CD14",] > 0})
# Classify each cell in the CellDataSet "mix" according to these types
mix <- classifyCells(mix, cth)
# Group the cells by the pData table column "Cluster". Apply the classification
functions to the cells groupwise. If a group is at least 5% of a type, make
them all that type. If the group is 5% one type, and 5% a different, mutually
exclusive type, mark the whole cluster "Ambiguous"
mix <- classifyCells(mix, Cluster, 0.05)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.