renameAndOrderClusters: Rename clusters using genes and metadata

View source: R/human_functions_rename.r

renameAndOrderClustersR Documentation

Rename clusters using genes and metadata

Description

This function uses information from both the data (gene expression) and meta-data (annotation) objects to build a useful automated cluster name.

Usage

renameAndOrderClusters(
  sampleInfo,
  classNameColumn = "cluster_type_label",
  classGenes = c("GAD1", "SLC17A7", "SLC1A3"),
  classLevels = c("inh", "exc", "glia"),
  layerNameColumn = "layer_label",
  regionNameColumn = "Region_label",
  matchNameColumn = "cellmap_label",
  newColorNameColumn = "cellmap_color",
  otherColumns = NULL,
  propLayer = 0.3,
  dend = NULL,
  orderbyColumns = c("layer", "region", "topMatch"),
  includeClusterCounts = FALSE,
  includeBroadGenes = FALSE,
  broadGenes = NULL,
  includeSpecificGenes = FALSE,
  propExpr = NULL,
  medianExpr = NULL,
  propDiff = 0,
  propMin = 0.5,
  medianFC = 1,
  excludeGenes = NULL,
  sortByMedian = TRUE,
  sep = "_"
)

Arguments

sampleInfo

Sample information with rows as samples and columns for annotations. All samples in sampleInfo are used for renaming (so subset prior to running this function if desired). Columns must include "cluster_id", "cluster_label", and "cluster_color".

classNameColumn

Column name where class information is stored (e.g., inh/exc/glia), or NULL if you'd like it to be defined based on classGenes

classGenes

Set of genes for defining classes (which is ignored in this context if classNameColumn!=NULL). Also used if broadClass gene is not expressed.

classLevels

A vector of the levels for classes of the same length (and in the same order) as classGenes. Either include all relevant levels or set to NA for none if using classNameColumn

layerNameColumn

Column name where the (numeric) layer info is stored (NA if none)

regionNameColumn

Column name where the (character) region info is stored (NA if none)

matchNameColumn

Column name where the (character) comparison info stored (e.g., closest mapping cell type for each cell pre-calculated against a previous taxonomy; NA if none)

newColorNameColumn

Column name where the new cluster colors are found (e.g., color column corresponding to matchNameColumn). NA keeps the current colors.

otherColumns

Other columns to transfer to the output variable. Note that the value from a random sample in the cluster is returned, so this usually should be left as default (NULL).

propLayer

Proportion of cells (relative to max) must be higher than this for a cluster to be considered as expressed in a particular layer (default is 0.3).

dend

Dendrogram object, only used for ordering of clusters (NULL as default)

orderbyColumns

column names indicating the outputted cluster order (not used unless dend=NULL). Must be some combination of "layer", "region", and "topMatch" in any order (or NULL). Default is first by "layer" than "region" then "topMatch".

includeClusterCounts

Should the number of cells in each cluster be included in name?

includeBroadGenes

Should broad genes be included in the name (if so, broadGenes must be provided)?

broadGenes

List of broad genes, where the top median CPM in cluster is included in name

includeSpecificGenes

Should specific genes be included in the name? If TRUE, the next seven parameters are used to call getTopMarkersByPropNew.

propExpr

matrix of proportions of cells expressing a gene in each cluster (genes=rows, clusters=columns)

medianExpr

matrix of median expression per cluster (genes=rows, clusters=columns)

propDiff

Must have difference in proportion higher than this value in "on" cluster compared with each other cluster

propMin

Must have higher proportion in "on" cluster

medianFC

Must have median fold change greater than this value in "on" group vs. each other cluster

excludeGenes

Genes exlcuded from marker consideration (NULL by default)

sortByMedian

Should genes passing all filters be prioritized by median fold change (TRUE, default) or by difference in proportion between clusters (FALSE)

sep

Separation character for renaming (default is "_")

Details

When all options are selected, the outputed format is as follows: [cell class]_[layer range]_[broad marker gene]_[specific marker gene]_[brain region with most cells (and scaled fraction of cells)]_[best matched type from previous taxonomy]_[number of cells in cluster]. The output is a data frame with information about each cluster, including the new cluster names. updateSampDat needs to be run after renameAndOrderClusters to apply the new cluster names to each sample. If a dendrogram has already been created, the dendrogram labels will also need to be changed separately.

Value

A data frame of cluster information, which includes the new and old names, the requested variables from sampleInfo, and all the specific components of the new name. This is the required input for updateSampDat in the appropriate format.


AllenInstitute/scrattch.hicat documentation built on Oct. 20, 2023, 6:55 a.m.