BiocStyle::markdown()

Authors: r packageDescription("MS2ID")[["Author"]]
Last modified: r file.info("MS2ID.Rmd")$mtime
Compiled: r date()

library(MS2ID)
library(BiocStyle)
knitr::opts_chunk$set(echo = TRUE, message = FALSE, eval = TRUE)

*This vignette describe in detail the annotate function. It is advised a previous reading of the MS2ID introduction.

Introduction

annotate is the MS2ID function that annotates MS/MS query spectra; every query spectrum is compared with a reference library, and compounds with a similar spectrum are listed. This function requires an MS2ID reference library, as described in MS2ID introduction; here we will use the sample attached to MS2ID:

## Decompress the MS2ID library that comes with MS2ID
MS2IDzipFile <- system.file("extdata/MS2IDLibrary.zip", package = "MS2ID",
                            mustWork = TRUE)
library(utils)
MS2IDdirectory <- dirname(unzip(MS2IDzipFile, exdir = tempdir()))[1]

The following example shows the basic usage of the function annotate.. Query spectra is provided pointing out the folder that contains the mzML files.

library(utils)
queryFile <- system.file("extdata/QRYspectra.zip", package = "MS2ID")
queryFolder <- file.path(tempdir(), "QRYspectra")
utils::unzip(queryFile, exdir = queryFolder)
#create the MS2ID object and annotate
library(MS2ID)
MS2IDobj <- MS2ID(MS2IDdirectory)
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj)

Optionally, query spectra can also be provided under the form of an spectra object [@Spectra]. That facilitates the use of tools developed in the r Biocpkg("Spectra") package to subset query spectra according to their retention time and MSLevel.

library(Spectra)
querySpectra <- Spectra::Spectra(dir(queryFolder, full.names = TRUE))
querySpectra <- querySpectra %>%
  Spectra::filterMsLevel(2) %>% 
  Spectra::filterRt(c(100, 400))
#annotate
annotResult <- MS2ID::annotate(QRYdata = querySpectra, MS2ID = MS2IDobj)

The annotate function returns an Annot object.

Annot object {#annot}

This object stores the annotation so we can:

In the example below, we use r Biocpkg("Spectra") tools to browse the annotation results, although it is more advisable to use the visual browsing provided by the MS2IDgui function.

#merge hits and compound info
result <- merge(x = hits(annotResult), y = refCompound(annotResult), 
                by.x = "idREFcomp", by.y = "id", all.y = FALSE)
head(result)
library(Spectra)
#Subset spectra and metadata considering first hit query spectra
idQRYspect_1 <- result$idQRYspect[1]
result_1 <- dplyr::filter(result, idQRYspect  == idQRYspect_1)
qrySpct_1 <- qrySpectra(annotResult)
qrySpct_1 <- qrySpct_1[qrySpct_1$id %in% result_1$idQRYspect]
refSpct_1 <- refSpectra(annotResult)
refSpct_1 <- refSpct_1[refSpct_1$id %in% result_1$idREFspect]
#compare query spectrum with its first hit reference spectrum
refSpct_draw <- refSpct_1[1]
refSpct_draw$intensity <- refSpct_draw$intensity/max(refSpct_draw$intensity)
qrySpct_1$intensity <- qrySpct_1$intensity/max(qrySpct_1$intensity)
plotSpectraMirror(qrySpct_1, refSpct_draw)

Consensus spectra {#consensus}

As a default, annotate function tries to summarize adjacent MS/MS spectra into consensus spectra: the resulting consensus spectra will be annotated instead of the query spectra that summarizes (along with the query spectra non able to be consensued). This strategy diminishes artifacts and noise and reduce significantly the annotation time.
A group of adjacent query spectra is considered source for a consensus spectrum when all of them have the same precursor mass, collision energy and polarity. Also, every spectrum must be similar (cosine > consCos argument) to the apex spectrum of the group and not too far away from it (less than 20 seconds). The resulting consensus spectrum will be formed by the fragments present in the majority of the source query spectra (ratio determined by the consComm argument).
The final annot object will contain not only the query spectra and the consensus spectra that succeeded in the annotation (i.e. with hits), but also the query spectra used to form the successful consensus spectra. Although it is recommended to use the MS2IDgui feature to elucidate the nature of the query spectra, it is also possible to check it by analyzing some of the variables contained in the annot object.

The algorithm will not annotate the spectra with rol=3.

qrySpct <- qrySpectra(annotResult)
#shor query data concerning consensus formation
head(Spectra::spectraData(qrySpct, 
                          c("id", "spectrumId_CONS", "rtime_CONS", "rol")))

Subsetting the reference library

In annotation, subsetting the reference library not only reduces the computing time significantly -by taking advantage of the MS2ID backend and the fragments index: it also prunes the result and cuts off non-sense hits. For example, the cmnFrags = c(m, n) argument limits the reference spectra so that reference spectra and query spectra have at least m peaks in common among their top n most intense peaks.
In addition, the cmnPrecMass argument limits the reference spectra to those with the precursor mass of the query spectrum. On the other hand, cmnNeutralMass limits reference spectra to those with a neutral mass plausible with the query precursor (considering all possible adducts).

annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj, 
                        cmnFrags = c(3, 5),
                        cmnPrecMass = TRUE, cmnNeutralMass = TRUE, 
                        cmnPolarity = TRUE, predicted = FALSE)

Other arguments subset the reference spectra according its experimental nature or the query spectrum polarization (predicted and cmnPolarity, respectively).

Annotation with different metrics

As a default, the annotate function uses cosine similarity as a metric to compare two spectra; its default threshold value to beat to consider the comparison a hit is 0.8.
The function also allows the simultaneous calculation of different metrics. In that case, a spectrum comparison is considered a hit when at least one of the metrics fulfills its threshold value. Note that to fulfill a threshold value has a different meaning depending on the metric: topsoe and squared_chord metrics return a lower number when the spectra are more similar so, unlike the rest, a hit will occur when the returned value is lower than its threshold.

annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
                        metrics = c("fidelity", "cosine", "topsoe"),
                        metricsThresh = c(0.6, 0.8, 0.6))
head(MS2ID::hits(annotResult))

Moreover, the user can define its own metric by declaring a function as an argument; in the following example, foo function uses cosine+1 as a distance metric.

foo <- function(finalMatrix){
  vector1 <- finalMatrix[1,]
  vector2 <- finalMatrix[2,]
  CosplusOne <- 1+ suppressMessages(
    philentropy::distance(rbind(vector1, vector2), method = "cosine")
    )
  names(CosplusOne) <- "CosplusOne"
  return(CosplusOne)
}

annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
                        metrics = c("cosine"), metricsThresh = c(0.8),
                        metricFUN = foo, metricFUNThresh = 1.8)
head(MS2ID::hits(annotResult))

References



jmbadia/MS2ID documentation built on Dec. 10, 2024, 2:33 p.m.