clusterFeatures: Feature clustering

View source: R/clusterFeatures.R

clusterFeaturesR Documentation

Feature clustering

Description

Function to cluster LC-MS features according to their retention time and intensity correlation across samples with a SummarizedExperiment.

Usage

clusterFeatures(
  x,
  i,
  rtime_var = "rtime",
  rt_cut = 10,
  cor_cut = 0.7,
  rt_grouping = c("hclust", "closest", "consecutive"),
  cor_grouping = c("louvain", "SimilarityMatrix", "connected", "none"),
  cor_use = c("everything", "all.obs", "complete.obs", "na.or.complete",
    "pairwise.complete.obs"),
  cor_method = c("pearson", "kendall", "spearman"),
  log2 = FALSE,
  hclust_linkage = "complete"
)

Arguments

x

A SummarizedExperiment object.

i

A string or integer value specifying which assay values to use.

rtime_var

A string specifying the name of variable containing a numeric vector of retention times in rowData(x).

rt_cut

A numeric value specifying a cut-off for the retention-time based feature grouping.

cor_cut

A numeric value specifying a cut-off for the correlation-based feature grouping.

rt_grouping

A string specifying which method to use for the retention-time based feature grouping.

cor_grouping

A string specifying which method to use for the correlation-based feature grouping.

cor_use

A string specifying which method to compute correlations in the presence of missing values. Refer to ?cor for details.

cor_method

A string specifying which correlation coefficient is to be computed. See ?cor for details.

log2

A logical specifying whether feature intensities need to be log2-transformed before calculating a correlation matrix.

hclust_linkage

A string specifying the linkage method to be used when rt_grouping is "hclust".

Details

For soft ionization methods (e.g., LC/ESI-MS) commonly used in metabolomics, one or more ions could be generated from an individual compound upon ionization. The redundancy of feature data needs to be addressed since we typically interested in compounds rather than different ion species. This function attempts to identify a group of features from the same compound with the following steps:

  1. Features are grouped by their retention times to identify co-eluting compounds.

  2. For each retention time-based group, features are further clustered by patterns of the intensity correlations across samples to identify a subset of features from the same compound.

The retention time-based grouping is performed using either a hierarchical clustering via hclust or the methods available in the MsFeatures package via MsFeatures::groupClosest and MsFeatures::groupConsecutive. For the rt_grouping = "hclust", by default, complete-linkage clustering is conducted using the Manhattan distance (i.e., difference in retention times) where the distance between two clusters is defined as the difference in retention times between the farthest pair of elements in the two clusters. Group memberships are assigned by specifying the cut height for the distance metric. Other linkage methods can be specified with hclust_linkage. Please refer to ?hclust for details. For the "closest" and "consecutive", please refer to ?MsFeatures::groupClosest and ?MsFeatures::groupConsecutive for the details of algorithms.

For the correlation-based grouping, cor_grouping = "connected" creates a undirected graph using feature correlations as an adjacency matrix (i.e., correlations serve as edge weights). The edges whose weights are below the cut-off specified by cor_cut will be removed from the graph, separating features into several disconnected subgroups. Features in the same subgroup will be assigned to the same feature cluster. For the "louvain", the function further applies the Louvain algorithm to the graph in order to identify densely connected features via igraph::cluster_louvain. For the "SimilarityMatrix", MsFeatures::groupSimilarityMatrix is used for feature grouping. Please refer to ?MsFeatures::groupSimilarityMatrix for the details of algorithm.

Value

A SummarizedExperiment object with the grouping results added to columns "rtime_group" (initial grouping on retention times) and "feature_group" in its rowData.

References

Johannes Rainer (2022). MsFeatures: Functionality for Mass Spectrometry Features. R package version 1.3.0. 'https://github.com/RforMassSpectrometry/MsFeatures

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of communities in large networks. J. Stat. Mech. (2008) P10008

Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. https://igraph.org

See Also

See hclust, cutree, MsFeatures::groupClosest, MsFeatures::groupConsecutive, MsFeatures::groupSimilarityMatrix, and igraph::cluster_louvain for the underlying functions that do work.

See plotRTgroup to visualize the grouping result.

Examples


data(faahko_se)

se <- clusterFeatures(faahko_se, i = "knn_vsn", rtime_var = "rtmed")
rowData(se)[, c("rtmed", "rtime_group", "feature_group")]


HimesGroup/qmtools documentation built on April 16, 2023, 8 p.m.