groupFeatures-eic-similarity | R Documentation |
Features from the same originating compound are expected to share their
elution pattern (i.e. chromatographic peak shape) with it.
Thus, this methods allows to group features based on similarity of their
extracted ion chromatograms (EICs). The similarity calculation is performed
separately for each sample with the similarity score being aggregated across
samples for the final generation of the similarity matrix on which the
grouping (considering parameter threshold
) will be performed.
The compareChromatograms()
function is used for similarity calculation
which by default calculates the Pearson's correlation coefficient. The
settings for compareChromatograms
can be specified with parameters
ALIGNFUN
, ALIGNFUNARGS
, FUN
and FUNARGS
. ALIGNFUN
defaults to
alignRt()
and is the function used to align the chromatograms before
comparison. ALIGNFUNARGS
allows to specify additional arguments for the
ALIGNFUN
function. It defaults to
ALIGNFUNARGS = list(tolerance = 0, method = "closest")
which ensures that
data points from the same spectrum (scan, i.e. with the same retention time)
are compared between the EICs from the same sample. Parameter FUN
defines
the function to calculate the similarity score and defaults to FUN = cor
and FUNARGS
allows to pass additional arguments to this function (defaults
to FUNARGS = list(use = "pairwise.complete.obs")
. See also
compareChromatograms()
for more information.
The grouping of features based on the EIC similarity matrix is performed
with the function specified with parameter groupFun
which defaults to
groupFun = groupSimilarityMatrix
which groups all rows (features) in the
similarity matrix with a similarity score larger than threshold
into the
same cluster. This creates clusters of features in which all features
have a similarity score >= threshold
with any other feature in that
cluster. See groupSimilarityMatrix()
for details. Additional parameters to
that function can be passed with the ...
argument.
This feature grouping should be called after an initial feature
grouping by retention time (see SimilarRtimeParam()
). The feature groups
defined in columns "feature_group"
of featureDefinitions(object)
(for
features matching msLevel
) will be used and refined by this method.
Features with a value of NA
in featureDefinitions(object)$feature_group
will be skipped/not considered for feature grouping.
EicSimilarityParam(
threshold = 0.9,
n = 1,
onlyPeak = TRUE,
value = c("maxo", "into"),
groupFun = groupSimilarityMatrix,
ALIGNFUN = alignRt,
ALIGNFUNARGS = list(tolerance = 0, method = "closest"),
FUN = cor,
FUNARGS = list(use = "pairwise.complete.obs"),
...
)
## S4 method for signature 'XcmsResult,EicSimilarityParam'
groupFeatures(object, param, msLevel = 1L)
threshold |
|
n |
|
onlyPeak |
|
value |
|
groupFun |
|
ALIGNFUN |
|
ALIGNFUNARGS |
named |
FUN |
|
FUNARGS |
named |
... |
for |
object |
|
param |
|
msLevel |
|
input object with feature groups added (i.e. in column
"feature_group"
of its featureDefinitions
data frame.
At present the featureChromatograms()
function is used to extract the
EICs for each feature, which does however use one m/z and rt range for
each feature and the EICs do thus not exactly represent the identified
chromatographic peaks of each sample (i.e. their specific m/z and
retention time ranges).
While being possible to be performed on the full data set without prior
feature grouping, this is not suggested for the following reasons: I) the
selection of the top n
samples with the highest signal for the
feature group will be biased by very abundant compounds as this is
performed on the full data set (i.e. the samples with the highest overall
intensities are used for correlation of all features) and II) it is
computationally much more expensive because a pairwise correlation between
all features has to be performed.
It is also suggested to perform the correlation on a subset of samples
per feature with the highest intensities of the peaks (for that feature)
although it would also be possible to run the correlation on all samples by
setting n
equal to the total number of samples in the data set. EIC
correlation should however be performed ideally on samples in which the
original compound is highly abundant to avoid correlation of missing values
or noisy peak shapes as much as possible.
By default also the signal which is outside identified chromatographic peaks is excluded from the correlation.
Johannes Rainer
feature-grouping for a general overview.
Other feature grouping methods:
groupFeatures-abundance-correlation
,
groupFeatures-similar-rtime
library(MsFeatures)
library(MsExperiment)
## Load a test data set with detected peaks
faahko_sub <- loadXcmsData("faahko_sub2")
## Disable parallel processing for this example
register(SerialParam())
## Group chromatographic peaks across samples
xodg <- groupChromPeaks(faahko_sub, param = PeakDensityParam(sampleGroups = rep(1, 3)))
## Performing a feature grouping based on EIC similarities on a single
## sample
xodg_grp <- groupFeatures(xodg, param = EicSimilarityParam(n = 1))
table(featureDefinitions(xodg_grp)$feature_group)
## Usually it is better to perform this correlation on pre-grouped features
## e.g. based on similar retention time.
xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 4))
xodg_grp <- groupFeatures(xodg_grp, param = EicSimilarityParam(n = 1))
table(featureDefinitions(xodg_grp)$feature_group)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.