View source: R/human_functions_rename.r
renameAndOrderClusters | R Documentation |
This function uses information from both the data (gene expression) and meta-data (annotation) objects to build a useful automated cluster name.
renameAndOrderClusters(
sampleInfo,
classNameColumn = "cluster_type_label",
classGenes = c("GAD1", "SLC17A7", "SLC1A3"),
classLevels = c("inh", "exc", "glia"),
layerNameColumn = "layer_label",
regionNameColumn = "Region_label",
matchNameColumn = "cellmap_label",
newColorNameColumn = "cellmap_color",
otherColumns = NULL,
propLayer = 0.3,
dend = NULL,
orderbyColumns = c("layer", "region", "topMatch"),
includeClusterCounts = FALSE,
includeBroadGenes = FALSE,
broadGenes = NULL,
includeSpecificGenes = FALSE,
propExpr = NULL,
medianExpr = NULL,
propDiff = 0,
propMin = 0.5,
medianFC = 1,
excludeGenes = NULL,
sortByMedian = TRUE,
sep = "_"
)
sampleInfo |
Sample information with rows as samples and columns for annotations. All samples in sampleInfo are used for renaming (so subset prior to running this function if desired). Columns must include "cluster_id", "cluster_label", and "cluster_color". |
classNameColumn |
Column name where class information is stored (e.g., inh/exc/glia),
or NULL if you'd like it to be defined based on |
classGenes |
Set of genes for defining classes (which is ignored in this context
if |
classLevels |
A vector of the levels for classes of the same length (and in the same
order) as classGenes. Either include all relevant levels or set to NA for none if using
|
layerNameColumn |
Column name where the (numeric) layer info is stored (NA if none) |
regionNameColumn |
Column name where the (character) region info is stored (NA if none) |
matchNameColumn |
Column name where the (character) comparison info stored (e.g., closest mapping cell type for each cell pre-calculated against a previous taxonomy; NA if none) |
newColorNameColumn |
Column name where the new cluster colors are found (e.g., color
column corresponding to |
otherColumns |
Other columns to transfer to the output variable. Note that the value from a random sample in the cluster is returned, so this usually should be left as default (NULL). |
propLayer |
Proportion of cells (relative to max) must be higher than this for a cluster to be considered as expressed in a particular layer (default is 0.3). |
dend |
Dendrogram object, only used for ordering of clusters (NULL as default) |
orderbyColumns |
column names indicating the outputted cluster order (not used unless dend=NULL). Must be some combination of "layer", "region", and "topMatch" in any order (or NULL). Default is first by "layer" than "region" then "topMatch". |
includeClusterCounts |
Should the number of cells in each cluster be included in name? |
includeBroadGenes |
Should broad genes be included in the name (if so, |
broadGenes |
List of broad genes, where the top median CPM in cluster is included in name |
includeSpecificGenes |
Should specific genes be included in the name? If TRUE, the next
seven parameters are used to call |
propExpr |
matrix of proportions of cells expressing a gene in each cluster (genes=rows, clusters=columns) |
medianExpr |
matrix of median expression per cluster (genes=rows, clusters=columns) |
propDiff |
Must have difference in proportion higher than this value in "on" cluster compared with each other cluster |
propMin |
Must have higher proportion in "on" cluster |
medianFC |
Must have median fold change greater than this value in "on" group vs. each other cluster |
excludeGenes |
Genes exlcuded from marker consideration (NULL by default) |
sortByMedian |
Should genes passing all filters be prioritized by median fold change (TRUE, default) or by difference in proportion between clusters (FALSE) |
sep |
Separation character for renaming (default is "_") |
When all options are selected, the outputed format is as follows: [cell class]_[layer
range]_[broad marker gene]_[specific marker gene]_[brain region with most cells (and
scaled fraction of cells)]_[best matched type from previous taxonomy]_[number of cells
in cluster]. The output is a data frame with information about each cluster, including
the new cluster names. updateSampDat
needs to be run after renameAndOrderClusters
to apply the new cluster names to each sample. If a dendrogram has already been
created, the dendrogram labels will also need to be changed separately.
A data frame of cluster information, which includes the new and old names, the
requested variables from sampleInfo, and all the specific components of the new name.
This is the required input for updateSampDat
in the appropriate format.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.