BiocStyle::markdown()
Authors: r packageDescription("MS2ID")[["Author"]]
Last modified: r file.info("MS2ID.Rmd")$mtime
Compiled: r date()
library(MS2ID) library(BiocStyle) knitr::opts_chunk$set(echo = TRUE, message = FALSE, eval = TRUE)
*This vignette describe in detail the annotate
function. It is advised a previous reading of the MS2ID introduction.
annotate
is the MS2ID function that annotates MS/MS query spectra; every query spectrum is compared with a reference library, and compounds with a similar spectrum are listed. This function requires an MS2ID reference library, as described in MS2ID introduction; here we will use the sample attached to MS2ID:
## Decompress the MS2ID library that comes with MS2ID MS2IDzipFile <- system.file("extdata/MS2IDLibrary.zip", package = "MS2ID", mustWork = TRUE) library(utils) MS2IDdirectory <- dirname(unzip(MS2IDzipFile, exdir = tempdir()))[1]
The following example shows the basic usage of the function annotate.
. Query spectra is provided pointing out the folder that contains the mzML files.
library(utils) queryFile <- system.file("extdata/QRYspectra.zip", package = "MS2ID") queryFolder <- file.path(tempdir(), "QRYspectra") utils::unzip(queryFile, exdir = queryFolder) #create the MS2ID object and annotate library(MS2ID) MS2IDobj <- MS2ID(MS2IDdirectory) annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj)
Optionally, query spectra can also be provided under the form of an spectra object [@Spectra]. That facilitates the use of tools developed in the r Biocpkg("Spectra")
package to subset query spectra according to their retention time and MSLevel.
library(Spectra) querySpectra <- Spectra::Spectra(dir(queryFolder, full.names = TRUE)) querySpectra <- querySpectra %>% Spectra::filterMsLevel(2) %>% Spectra::filterRt(c(100, 400)) #annotate annotResult <- MS2ID::annotate(QRYdata = querySpectra, MS2ID = MS2IDobj)
The annotate
function returns an Annot
object.
This object stores the annotation so we can:
MS2IDgui
function.export2xlsx
function.hits()
: Returns a cross-reference data frame containing the annotation hits, the id of the spectra and compounds and: qrySpectra()
: returns an Spectra
object (r Biocpkg("Spectra")
package) containing both successful query and consensus spectra (and their source spectra) (see consensus spectra).refSpectra()
: returns an Spectra
object (r Biocpkg("Spectra")
package) with the reference spectra present in the hits table.refCompound()
: returns a data frame containing metadata of the reference compounds present in the hits table.infoAnnotation()
: variables used on the annotate
function.In the example below, we use r Biocpkg("Spectra")
tools to browse the annotation results, although it is more advisable to use the visual browsing provided by the MS2IDgui
function.
#merge hits and compound info result <- merge(x = hits(annotResult), y = refCompound(annotResult), by.x = "idREFcomp", by.y = "id", all.y = FALSE) head(result)
library(Spectra) #Subset spectra and metadata considering first hit query spectra idQRYspect_1 <- result$idQRYspect[1] result_1 <- dplyr::filter(result, idQRYspect == idQRYspect_1) qrySpct_1 <- qrySpectra(annotResult) qrySpct_1 <- qrySpct_1[qrySpct_1$id %in% result_1$idQRYspect] refSpct_1 <- refSpectra(annotResult) refSpct_1 <- refSpct_1[refSpct_1$id %in% result_1$idREFspect] #compare query spectrum with its first hit reference spectrum refSpct_draw <- refSpct_1[1] refSpct_draw$intensity <- refSpct_draw$intensity/max(refSpct_draw$intensity) qrySpct_1$intensity <- qrySpct_1$intensity/max(qrySpct_1$intensity) plotSpectraMirror(qrySpct_1, refSpct_draw)
As a default, annotate
function tries to summarize adjacent MS/MS spectra into consensus spectra: the resulting consensus spectra will be annotated instead of the query spectra that summarizes (along with the query spectra non able to be consensued). This strategy diminishes artifacts and noise and reduce significantly the annotation time.
A group of adjacent query spectra is considered source for a consensus spectrum when all of them have the same precursor mass, collision energy and polarity. Also, every spectrum must be similar (cosine > consCos
argument) to the apex spectrum of the group and not too far away from it (less than 20 seconds). The resulting consensus spectrum will be formed by the fragments present in the majority of the source query spectra (ratio determined by the consComm
argument).
The final annot
object will contain not only the query spectra and the consensus spectra that succeeded in the annotation (i.e. with hits), but also the query spectra used to form the successful consensus spectra. Although it is recommended to use the MS2IDgui feature to elucidate the nature of the query spectra, it is also possible to check it by analyzing some of the variables contained in the annot
object.
The algorithm will not annotate the spectra with rol=3.
qrySpct <- qrySpectra(annotResult) #shor query data concerning consensus formation head(Spectra::spectraData(qrySpct, c("id", "spectrumId_CONS", "rtime_CONS", "rol")))
In annotation, subsetting the reference library not only reduces the computing time significantly -by taking advantage of the MS2ID backend and the fragments index: it also prunes the result and cuts off non-sense hits. For example, the cmnFrags = c(m, n)
argument limits the reference spectra so that reference spectra and query spectra have at least m peaks in common among their top n most intense peaks.
In addition, the cmnPrecMass
argument limits the reference spectra to those with the precursor mass of the query spectrum. On the other hand, cmnNeutralMass
limits reference spectra to those with a neutral mass plausible with the query precursor (considering all possible adducts).
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj, cmnFrags = c(3, 5), cmnPrecMass = TRUE, cmnNeutralMass = TRUE, cmnPolarity = TRUE, predicted = FALSE)
Other arguments subset the reference spectra according its experimental nature or the query spectrum polarization (predicted
and cmnPolarity
, respectively).
As a default, the annotate function uses cosine similarity as a metric to compare two spectra; its default threshold value to beat to consider the comparison a hit is 0.8.
The function also allows the simultaneous calculation of different metrics. In that case, a spectrum comparison is considered a hit when at least one of the metrics fulfills its threshold value. Note that to fulfill a threshold value has a different meaning depending on the metric: topsoe and squared_chord metrics return a lower number when the spectra are more similar so, unlike the rest, a hit will occur when the returned value is lower than its threshold.
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj, metrics = c("fidelity", "cosine", "topsoe"), metricsThresh = c(0.6, 0.8, 0.6)) head(MS2ID::hits(annotResult))
Moreover, the user can define its own metric by declaring a function as an argument; in the following example, foo
function uses cosine+1 as a distance metric.
foo <- function(finalMatrix){ vector1 <- finalMatrix[1,] vector2 <- finalMatrix[2,] CosplusOne <- 1+ suppressMessages( philentropy::distance(rbind(vector1, vector2), method = "cosine") ) names(CosplusOne) <- "CosplusOne" return(CosplusOne) } annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj, metrics = c("cosine"), metricsThresh = c(0.8), metricFUN = foo, metricFUNThresh = 1.8) head(MS2ID::hits(annotResult))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.