merge_clusters: Merge redundant clusters by expression profile similarity
In ConesaLab/acorde: Isoform co-usage networks from single-cell RNA-seq data

merge_clusters

R Documentation

Merge redundant clusters by expression profile similarity

Description

Join clusters representing the same expression pattern across cell types (redundant clusters). This function uses a metaclustering system (see details) and user-defined similarity thresholds that allows to control for the stringency of the merge process.

Usage

merge_clusters(
  data,
  isoform_col = NULL,
  id_table,
  cluster_list,
  percentile_no = 10,
  dynamic = FALSE,
  method = c("percentile", "pearson", "spearman", "rho", "zi_kendall"),
  height_cutoff = 0.2,
  cutree_no = NULL,
  ...
)

Arguments

`data`	A data.frame or tibble object including isoforms as rows and cells as columns. Isoform IDs can be included as row names (data.frame) or as an additional column (tibble).
`isoform_col`	When a tibble is provided in `data`, a character value indicating the name of the column in which isoform IDs are specified.
`id_table`	A data frame including two columns named `cell` and `cell_type`, in which correspondence between cell ID and cell type should be provided. The number of rows should be equal to the total number of cell columns in `data`, and the order of the `cell` column should match column (i.e. cell) order in `data`.
`cluster_list`	A list of character vectors, each containing the identifiers of the isoforms in a cluster.
`percentile_no`	Integer indicating the number of percentiles that will be used to summarized cell type expression via `percentile_expr`. Should always be higher than 4 (quantiles) and lower than 100 (percentiles). Defaults to 10.
`dynamic`	A logical. If `TRUE`, merge will be performed via dynamic hierarchical clustering. Defaults to `FALSE`.
`method`	Character indicating a co-expression method to use for merging similar clusters. Should be one of `percentile, pearson, spearman, zi_kendall, rho` (see details). Percentile correlation is used by default.
`height_cutoff`	When `dynamic = FALSE`, a numeric value between 0 and 1 to be supplied to `cutree` via the `h` argument. Indicates the height where the created dendrogram tree should be cut to generate groups of merged clusters.
`cutree_no`	An integer indicating the desired number of groups to merge clusters into. Supplied to `cutree` via the `k` argument. Only required when `dynamic = FALSE` and `height_cutoff = NULL`.
`...`	Additional arguments passed to `cutreeHybrid` (only when `dynamic = TRUE`).

Details

During the isoform clustering process, it is generally useful to prioritize the reduction of within-cluster variability. This, however, can lead to obtaining a large number of small, redundant clusters. To mitigate this effect, acorde includes a step where clusters with high profile similarity can be merged using the correlation between their metatranscripts. A cluster's metatranscript is calculated as the mean of the percentile-summarized expression of all of the isoforms in that cluster. Then, co-expression values between metatranscripts are calculated and used to generate a distance matrix to group cluster profiles by similarity, a process that can be referred to as metaclustering.

By default, the metaclustering proccess is done using traditional hierarchical clustering via hclust, which requires the definition of either a height cutoff (height_cutoff parameter) or a number of clusters to obtain (cutree_no).

Available co-expression metrics (selected via the method) include:

percentile: percentile correlations computed using percentile_cor.
pearson: Pearson correlation computed using cor.
spearman: Spearman correlation computed using cor.
zi_kendall: zero-inflated Kendall correlation computed using the dismay function.
rho: rho proportionality metric computed using the dismay function.

Alternatively, users may choose to perform metatranscript clustering dynamically using the dynamicTreeCut package, therefore setting dynamic = TRUE. In this case, additional parameters will need to be supplied to the cutreeHybrid function via the ... argument. Note that minClusterSize = 1 is set internally to allow clusters to remain unmerged if no redundancies with the profiles of other clusters are found.

Value

A named list containing two elements:

merged_groups: a list detailing merge decisions, in which each element contains the identifiers of the clusters that were merged together.
clusters: a list of character vectors, containing the identifiers of isoforms included in each of the resulting clusters.