View source: R/clusterFeatures.R
clusterFeatures | R Documentation |
Function to cluster LC-MS features according to their retention time and intensity correlation across samples with a SummarizedExperiment.
clusterFeatures(
x,
i,
rtime_var = "rtime",
rt_cut = 10,
cor_cut = 0.7,
rt_grouping = c("hclust", "closest", "consecutive"),
cor_grouping = c("louvain", "SimilarityMatrix", "connected", "none"),
cor_use = c("everything", "all.obs", "complete.obs", "na.or.complete",
"pairwise.complete.obs"),
cor_method = c("pearson", "kendall", "spearman"),
log2 = FALSE,
hclust_linkage = "complete"
)
x |
A SummarizedExperiment object. |
i |
A string or integer value specifying which assay values to use. |
rtime_var |
A string specifying the name of variable containing a
numeric vector of retention times in |
rt_cut |
A numeric value specifying a cut-off for the retention-time based feature grouping. |
cor_cut |
A numeric value specifying a cut-off for the correlation-based feature grouping. |
rt_grouping |
A string specifying which method to use for the retention-time based feature grouping. |
cor_grouping |
A string specifying which method to use for the correlation-based feature grouping. |
cor_use |
A string specifying which method to compute correlations in
the presence of missing values. Refer to |
cor_method |
A string specifying which correlation coefficient is to be
computed. See |
log2 |
A logical specifying whether feature intensities need to be log2-transformed before calculating a correlation matrix. |
hclust_linkage |
A string specifying the linkage method to be used when
|
For soft ionization methods (e.g., LC/ESI-MS) commonly used in metabolomics, one or more ions could be generated from an individual compound upon ionization. The redundancy of feature data needs to be addressed since we typically interested in compounds rather than different ion species. This function attempts to identify a group of features from the same compound with the following steps:
Features are grouped by their retention times to identify co-eluting compounds.
For each retention time-based group, features are further clustered by patterns of the intensity correlations across samples to identify a subset of features from the same compound.
The retention time-based grouping is performed using either a hierarchical
clustering via hclust or the methods available in the MsFeatures
package via MsFeatures::groupClosest and MsFeatures::groupConsecutive.
For the rt_grouping
= "hclust", by default, complete-linkage
clustering is conducted using the Manhattan distance (i.e., difference in
retention times) where the distance between two clusters is defined as the
difference in retention times between the farthest pair of elements in the
two clusters. Group memberships are assigned by specifying the cut height
for the distance metric. Other linkage methods can be specified with
hclust_linkage
. Please refer to ?hclust
for details. For the
"closest" and "consecutive", please refer to
?MsFeatures::groupClosest
and ?MsFeatures::groupConsecutive
for the details of algorithms.
For the correlation-based grouping, cor_grouping
= "connected"
creates a undirected graph using feature correlations as an adjacency matrix
(i.e., correlations serve as edge weights). The edges whose weights are
below the cut-off specified by cor_cut
will be removed from the graph,
separating features into several disconnected subgroups. Features in the
same subgroup will be assigned to the same feature cluster. For the
"louvain", the function further applies the Louvain algorithm to the graph
in order to identify densely connected features via
igraph::cluster_louvain. For the "SimilarityMatrix",
MsFeatures::groupSimilarityMatrix is used for feature grouping. Please
refer to ?MsFeatures::groupSimilarityMatrix
for the details of
algorithm.
A SummarizedExperiment object with the grouping
results added to columns "rtime_group" (initial grouping on retention
times) and "feature_group" in its rowData
.
Johannes Rainer (2022). MsFeatures: Functionality for Mass Spectrometry Features. R package version 1.3.0. 'https://github.com/RforMassSpectrometry/MsFeatures
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of communities in large networks. J. Stat. Mech. (2008) P10008
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. https://igraph.org
See hclust, cutree, MsFeatures::groupClosest, MsFeatures::groupConsecutive, MsFeatures::groupSimilarityMatrix, and igraph::cluster_louvain for the underlying functions that do work.
See plotRTgroup to visualize the grouping result.
data(faahko_se)
se <- clusterFeatures(faahko_se, i = "knn_vsn", rtime_var = "rtmed")
rowData(se)[, c("rtmed", "rtime_group", "feature_group")]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.