merge_clusters: Merge redundant clusters by expression profile similarity

View source: R/clustering.R

merge_clustersR Documentation

Merge redundant clusters by expression profile similarity

Description

Join clusters representing the same expression pattern across cell types (redundant clusters). This function uses a metaclustering system (see details) and user-defined similarity thresholds that allows to control for the stringency of the merge process.

Usage

merge_clusters(
  data,
  isoform_col = NULL,
  id_table,
  cluster_list,
  percentile_no = 10,
  dynamic = FALSE,
  method = c("percentile", "pearson", "spearman", "rho", "zi_kendall"),
  height_cutoff = 0.2,
  cutree_no = NULL,
  ...
)

Arguments

data

A data.frame or tibble object including isoforms as rows and cells as columns. Isoform IDs can be included as row names (data.frame) or as an additional column (tibble).

isoform_col

When a tibble is provided in data, a character value indicating the name of the column in which isoform IDs are specified.

id_table

A data frame including two columns named cell and cell_type, in which correspondence between cell ID and cell type should be provided. The number of rows should be equal to the total number of cell columns in data, and the order of the cell column should match column (i.e. cell) order in data.

cluster_list

A list of character vectors, each containing the identifiers of the isoforms in a cluster.

percentile_no

Integer indicating the number of percentiles that will be used to summarized cell type expression via percentile_expr. Should always be higher than 4 (quantiles) and lower than 100 (percentiles). Defaults to 10.

dynamic

A logical. If TRUE, merge will be performed via dynamic hierarchical clustering. Defaults to FALSE.

method

Character indicating a co-expression method to use for merging similar clusters. Should be one of percentile, pearson, spearman, zi_kendall, rho (see details). Percentile correlation is used by default.

height_cutoff

When dynamic = FALSE, a numeric value between 0 and 1 to be supplied to cutree via the h argument. Indicates the height where the created dendrogram tree should be cut to generate groups of merged clusters.

cutree_no

An integer indicating the desired number of groups to merge clusters into. Supplied to cutree via the k argument. Only required when dynamic = FALSE and height_cutoff = NULL.

...

Additional arguments passed to cutreeHybrid (only when dynamic = TRUE).

Details

During the isoform clustering process, it is generally useful to prioritize the reduction of within-cluster variability. This, however, can lead to obtaining a large number of small, redundant clusters. To mitigate this effect, acorde includes a step where clusters with high profile similarity can be merged using the correlation between their metatranscripts. A cluster's metatranscript is calculated as the mean of the percentile-summarized expression of all of the isoforms in that cluster. Then, co-expression values between metatranscripts are calculated and used to generate a distance matrix to group cluster profiles by similarity, a process that can be referred to as metaclustering.

By default, the metaclustering proccess is done using traditional hierarchical clustering via hclust, which requires the definition of either a height cutoff (height_cutoff parameter) or a number of clusters to obtain (cutree_no).

Available co-expression metrics (selected via the method) include:

  1. percentile: percentile correlations computed using percentile_cor.

  2. pearson: Pearson correlation computed using cor.

  3. spearman: Spearman correlation computed using cor.

  4. zi_kendall: zero-inflated Kendall correlation computed using the dismay function.

  5. rho: rho proportionality metric computed using the dismay function.

Alternatively, users may choose to perform metatranscript clustering dynamically using the dynamicTreeCut package, therefore setting dynamic = TRUE. In this case, additional parameters will need to be supplied to the cutreeHybrid function via the ... argument. Note that minClusterSize = 1 is set internally to allow clusters to remain unmerged if no redundancies with the profiles of other clusters are found.

Value

A named list containing two elements:

  1. merged_groups: a list detailing merge decisions, in which each element contains the identifiers of the clusters that were merged together.

  2. clusters: a list of character vectors, containing the identifiers of isoforms included in each of the resulting clusters.

References

\insertRef

Langfelder2008acorde

\insertRef

Venables2002acorde

\insertRef

Skinnider2019acorde


ConesaLab/acorde documentation built on Feb. 25, 2024, 4:16 a.m.