matchSpectra,Spectra,CompDbSource,Param-method | R Documentation |
matchSpectra
compares experimental (query) MS2 spectra against
reference (target) MS2 spectra and reports matches with a similarity that
passing a specified threshold. The function performs the similarity
calculation between each query spectrum against each target spectrum.
Parameters query
and target
can be used to define the query and target
spectra, respectively, while parameter param
allows to define and configure
the similarity calculation and matching condition. Parameter query
takes
a Spectra::Spectra object while target
can be either a
Spectra::Spectra object, a CompoundDb::CompDb (reference library)
object defined in the CompoundDb
package or
a CompAnnotationSource (e.g. a CompDbSource()
)
with the reference or connection information to a supported annotation
resource).
Some notes on performance and information on parallel processing are provided in the vignette.
Currently supported parameter objects defining the matching are:
CompareSpectraParam
: the generic parameter object allowing to set all
settings for the Spectra::compareSpectra()
call that is used to
perform the similarity calculation.
This includes MAPFUN
and FUN
defining the
peak-mapping and similarity calculation functions and ppm
and tolerance
to define an acceptable difference between m/z values of the compared
peaks. Additional parameters to the compareSpectra
call
can be passed along with ...
. See the help of Spectra::Spectra()
for
more information on these parameters. Parameters requirePrecursor
(default TRUE
) and requirePrecursorPeak
(default FALSE
) allow to
pre-filter the target spectra prior to the actual similarity calculation
for each individual query spectrum.
Target spectra can also be pre-filtered based on
retention time if parameter toleranceRt
is set to a value different than
the default toleranceRt = Inf
. Only target spectra with a retention time
within the query's retention time +/- (toleranceRt
+ percentRt
% of the
query's retention time) are considered. Note that while for ppm
and
tolerance
only a single value is accepted, toleranceRt
and percentRt
can be also of length equal to the number of query spectra hence allowing
to define different rt boundaries for each query spectrum.
While these pre-filters can considerably improve performance, it should be
noted that no matches will be found between query and target spectra with
missing values in the considered variable (precursor m/z or retention
time). For target spectra without retention times (such as for Spectra
from a public reference database such as MassBank) the default
toleranceRt = Inf
should thus be used.
Finally, parameter THRESHFUN
allows to define a function to be applied to
the similarity scores to define which matches to report. See below for more
details.
MatchForwardReverseParam
: performs spectra matching as with
CompareSpectraParam
but reports, similar to MS-DIAL, also the reverse
similarity score and the presence ratio. In detail, the matching of query
spectra to target spectra is performed by considering all peaks from the
query and all peaks from the target (reference) spectrum (i.e. forward
matching using an outer join-based peak matching strategy). For matching
spectra also the reverse similarity is calculated considering only peaks
present in the target (reference) spectrum (i.e. using a right join-based
peak matching). This is reported as spectra variable "reverse_score"
.
In addition, the ratio between the number of matched peaks and the total
number of peaks in the target (reference) spectra is reported as the
presence ratio (spectra variable "presence_ratio"
) and the total
number of matched peaks as "matched_peaks_count"
. See examples below
for details. Parameter THRESHFUN_REVERSE
allows to define an additional
threshold function to filter matches. If THRESHFUN_REVERSE
is defined
only matches with a spectra similarity fulfilling both THRESHFUN
and
THRESHFUN_REVERSE
are returned. With the default
THRESHFUN_REVERSE = NULL
all matches passing THRESHFUN
are reported.
## S4 method for signature 'Spectra,CompDbSource,Param'
matchSpectra(
query,
target,
param,
BPPARAM = BiocParallel::SerialParam(),
addOriginalQueryIndex = TRUE
)
CompareSpectraParam(
MAPFUN = joinPeaks,
tolerance = 0,
ppm = 5,
FUN = MsCoreUtils::ndotproduct,
requirePrecursor = TRUE,
requirePrecursorPeak = FALSE,
THRESHFUN = function(x) which(x >= 0.7),
toleranceRt = Inf,
percentRt = 0,
...
)
MatchForwardReverseParam(
MAPFUN = joinPeaks,
tolerance = 0,
ppm = 5,
FUN = MsCoreUtils::ndotproduct,
requirePrecursor = TRUE,
requirePrecursorPeak = FALSE,
THRESHFUN = function(x) which(x >= 0.7),
THRESHFUN_REVERSE = NULL,
toleranceRt = Inf,
percentRt = 0,
...
)
## S4 method for signature 'Spectra,Spectra,CompareSpectraParam'
matchSpectra(
query,
target,
param,
rtColname = c("rtime", "rtime"),
BPPARAM = BiocParallel::SerialParam(),
addOriginalQueryIndex = TRUE
)
## S4 method for signature 'Spectra,CompDb,Param'
matchSpectra(
query,
target,
param,
rtColname = c("rtime", "rtime"),
BPPARAM = BiocParallel::SerialParam(),
addOriginalQueryIndex = TRUE
)
## S4 method for signature 'Spectra,Spectra,MatchForwardReverseParam'
matchSpectra(
query,
target,
param,
rtColname = c("rtime", "rtime"),
BPPARAM = BiocParallel::SerialParam(),
addOriginalQueryIndex = TRUE
)
query |
for |
target |
for |
param |
for |
BPPARAM |
for |
addOriginalQueryIndex |
for |
MAPFUN |
|
tolerance |
|
ppm |
|
FUN |
|
requirePrecursor |
|
requirePrecursorPeak |
|
THRESHFUN |
|
toleranceRt |
|
percentRt |
|
... |
for |
THRESHFUN_REVERSE |
for |
rtColname |
|
matchSpectra
returns a MatchedSpectra()
object with the matching
results. If target
is a CompAnnotationSource
only matching target
spectra will be reported.
Constructor functions return an instance of the class.
Johannes Rainer, Michael Witting
library(Spectra)
library(msdata)
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata")
pest_ms2 <- filterMsLevel(Spectra(fl), 2L)
## subset to selected spectra.
pest_ms2 <- pest_ms2[c(808, 809, 945:955)]
## Load a small example MassBank data set
load(system.file("extdata", "minimb.RData", package = "MetaboAnnotation"))
## Match spectra with the default similarity score (normalized dot product)
csp <- CompareSpectraParam(requirePrecursor = TRUE, ppm = 10)
mtches <- matchSpectra(pest_ms2, minimb, csp)
mtches
## Are there any matching spectra for the first query spectrum?
mtches[1]
## No
## And for the second query spectrum?
mtches[2]
## The second query spectrum matches 4 target spectra. The scores for these
## matches are:
mtches[2]$score
## To access the score for the full data set
mtches$score
## Below we use a THRESHFUN that returns for each query spectrum the (first)
## best matching target spectrum.
csp <- CompareSpectraParam(requirePrecursor = FALSE, ppm = 10,
THRESHFUN = function(x) which.max(x))
mtches <- matchSpectra(pest_ms2, minimb, csp)
mtches
## Each of the query spectra is matched to one target spectrum
length(mtches)
matches(mtches)
## Match spectra considering also measured retention times. This requires
## that both query and target spectra have non-missing retention times.
rtime(pest_ms2)
rtime(minimb)
## Target spectra don't have retention times. Below we artificially set
## retention times to show how an additional retention time filter would
## work.
rtime(minimb) <- rep(361, length(minimb))
## Matching spectra requiring a matching precursor m/z and the difference
## of retention times between query and target spectra to be <= 2 seconds.
csp <- CompareSpectraParam(requirePrecursor = TRUE, ppm = 10,
toleranceRt = 2)
mtches <- matchSpectra(pest_ms2, minimb, csp)
mtches
matches(mtches)
## Note that parameter `rtColname` can be used to define different spectra
## variables with retention time information (such as retention indices etc).
## A `CompDb` compound annotation database could also be used with
## parameter `target`. Below we load the test `CompDb` database from the
## `CompoundDb` Bioconductor package.
library(CompoundDb)
fl <- system.file("sql", "CompDb.MassBank.sql", package = "CompoundDb")
cdb <- CompDb(fl)
res <- matchSpectra(pest_ms2, cdb, CompareSpectraParam())
## We do however not find any matches since the used compound annotation
## database contains only a very small subset of the MassBank.
res
## As `target` we have now however the MS2 spectra data from the compound
## annotation database
target(res)
## See the package vignette for details, descriptions and more examples,
## also on how to retrieve e.g. MassBank reference databases from
## Bioconductor's AnnotationHub.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.