BiocStyle::markdown()

Package: r BiocStyle::Biocpkg("SpectraQL")
Authors: r packageDescription("SpectraQL")[["Author"]]
Compiled: r date()

Introduction

The SpectraQL package adds support for the Mass Spec Query Language MassQL to the Spectra package. Spectra objects can thus be filtered and data extracted using queries expressed in the MassQL language.

Installation

Until added to Bioconductor, the package can be installed with devtools::install_github("RforMassSpectrometry/SpectraQL").

The package can be installed with the BiocManager package. To install BiocManager use install.packages("BiocManager") and, after that, BiocManager::install("SpectraQL") to install this package.

Extracting data from Spectra objects with MassQL

The SpectraQL package adds support for most of the Mass Spec Query Language functionality to Spectra objects hence allowing to filter and extract data using platform independent and data storage agnostic queries. At present, SpectraQL supports most, but not all of the conditions and expressions of the MassQL definition. In particular, numeric operations in query fields such as MS2PROD=100+14 are not yet supported.

Below we load MS data from an example mzML file. Note also that Spectra objects can represent MS data from a very large variety of sources including but not limited to mzML or mzXML files, databases (e.g. MsBackendMassbank) or MGF files (MsBackendMgf). The use of the Spectra an hence SpectraQL package is thus not restricted to mzML-backed MS data.

library(Spectra)
library(SpectraQL)
library(msdata)

fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata")
dda <- Spectra(fl)

MassQL definition

A MassQL query consists of several parts: "QUERY <type of data> WHERE <condition> AND <condition> FILTER <filter> AND <filter>" (see its definition for more details). "<type of data>" defines which data should be extracted from the selected data set, "<condition>" defines which spectra should be selected and "<filter>" allows to subset each individual spectrum (i.e. to which mass peaks each spectrum should be subsetted). See the following sections for more information on each part of such an MassQL query. SpectraQL supports the MassQL query schema is however case insensitive (i.e. both "query" and "QUERY" are accepted) and is insensitive to white spaces (i.e. both "RTMIN=10" and "RTMIN = 10" are supported). Also, it should be noted that not all keywords, filters and operations defined by MassQL are yet supported by SpectraQL. The supported types of data, conditions and filters are listed in the sections below.

Type of data

The "<type of data>" defines what should be returned by the query function. In addition, functions can be specified and applied to "<type of data>" to summarize the results. The types of data that are currently supported by SpectraQL are:

In addition, functions can be used to extract specific information from the selected spectra. SpectraQL supports at present the following functions defined by MassQL:

Condition

The "<condition>" allows to subset a Spectra object. Several conditions can be combined with "and". The syntax for a condition is "<condition> = <value>", e.g. "MS2PROD = 144.1". Such conditions can be further refined by additional expressions that allow for example to define acceptable tolerances for m/z differences (see further below for details). SpectraQL supports the following conditions:

All conditions involving m/z values allow to specify a mass accuracy using the optional fields "TOLERANCEMZ" and "TOLERANCEPPM" that define the absolute and m/z-relative acceptable difference in m/z values. One or both fields can be attached to a condition such as "MS2PREC=100:TOLERANCEMZ=0.1:TOLERANCEPPM=20" to select for example all MS2 spectra with a precursor m/z equal to 100 accepting a difference of 0.1 and 20 ppm. Note that in contrast to MassQL, the default tolarance and ppm is 0 for all calls.

Filter

Filters allow to subset individual spectra keeping e.g. only peaks that match a user-defined m/z value. SpectraQL supports the following filters:

Differences of the SpectraQL implementation to the MassQL definition

Examples

In this section we use MassQL queries to subset and extract data from our Spectra object dda. This object contains MS1 and MS2 spectra from a data dependent acquisition. Some general information is provided below.

#' Number of MS1 and MS2 spectra
table(msLevel(dda))

#' retention time range
range(rtime(dda))

Filtering and subsetting

To restrict the data to spectra measured between 200 and 300 seconds of the measurement run we can use a MassQL query with the "rtmin" and "rtmax" conditions.

dda_rt <- query(dda, "QUERY * WHERE RTMIN = 200 AND RTMAX = 300")
dda_rt

Internally, SpectraQL translates the MassQL query into filter functions that are applied to the Spectra object. The returned result is now a Spectra object with all spectra measured between 200 and 300 seconds.

range(rtime(dda_rt))

To restrict to MS2 spectra with their precursor m/z matching a certain value, we can use the "MS2PREC" condition. Conditions on m/z values support also accuracy specifications. Using "TOLERANCEMZ" and "TOLERANCEPPM" it is thus possible to define an absolute and m/z relative acceptable difference. Below we use such a condition to select MS2 spectra with a precursor m/z matching 278.093 with a tolerance of 0 and ppm of 20. Since SpectraQL is case insensitive, we write the full MassQL query in lower case below.

dda_pmz <- query(
    dda,
    "query * where ms2prec = 278.093:toleancemz=0:toleranceppm=30")
length(dda_pmz)
dda_pmz

We thus selected 9 spectra from the data set with a precursor m/z matching our filter criteria.

precursorMz(dda_pmz)

We can obviously also combine the two queries into a single MassQL query. This time we also omit the "tolerancemz" field as we set that anyway to a value of 0.

dda_rt <- query(dda,
                paste0("query * where rtmin = 200 and rtmax = 300 and ",
                       "ms2prec = 278.093:toleranceppm = 30"))
dda_rt

For conditions "MS2PROD", "MS2PREC" and "MS1MZ" it is also possible to provide multiple values to select spectra that match any of the provided m/z values. These values can be separated by "OR" (or "or"), "TOLERANCEPPM" and "TOLERANCEMZ" will be applied to each of these. Below we select all spectra that contain an (MS1) peak with an m/z matching any of the provided values.

query(dda, "QUERY * WHERE MS1MZ = (123 OR 234.1 OR 300):TOLERANCEMZ=0.05")

Note that using "MS1MZ" will return only MS1 spectra and "MS2PROD", "MS2MZ" and "MS2PREC" only MS2 spectra.

Choosing which data to return

Thus far we uses "*" in all queries, which returns the result as a Spectra object. Alternatively, we can use "MS1DATA" or "MS2DATA" which will return the peak data for MS1 or MS2 spectra, respectively. The result is thus not a Spectra object, but a list of two-column matrices with the m/z and intensity values of all selected spectra. Below we use this to extract the peaks data for all MS2 spectra measured between 200 and 300 seconds.

pks <- query(dda, "query ms2data where rtmin = 200 and rtmax = 300")
length(pks)
head(pks[[1L]])

We can in addition also use functions to extract only specific data from the result. "scansum" would for example return the sum of intensities per spectra and would thus allow to extract a total ion chromatogram:

tic <- query(dda, "query scansum(ms1data)")
plot(tic, type = "l", xlab = "scan index")

Please note that the x-axis does not represent the retention times but the scan indices.

The function "scaninfo" extracts all information from the selected spectra. For SpectraQL this means that the result from spectraData are returned:

si <- query(dda, "query scaninfo(*) where rtmin = 300 and rtmax = 500")
si

Filtering peaks within spectra

In contrast to the condition statements ("WHERE"), "FILTER" allows to filter the peak data within spectra. We can for example filter all spectra keeping only peaks with a certain m/z. Below we filter the MS1 data keeping only peaks with an m/z value matching 219.095 with a tolerance of 0.01 and extract the ion count for that.

ic <- query(
    dda, "query scansum(ms1data) filter ms1mz = 219.095:tolerancemz=0.1")
plot(ic, type = "l", xlab = "scan index")

We can also combine that with "QUERY" to restrict to a certain retention time range to generate an extracted ion chromatogram (XIC).

ic <- query(dda,
            paste0("query scansum(ms1data) where rtmin = 235 and rtmax = 250",
                   " filter ms1mz = 219.095:tolerancemz=0.1"))
plot(ic, type = "l", xlab = "scan index")

Internally, SpectraQL's function translates the MassQL query into the following combination of function calls:

res <- dda |>
filterMsLevel(1L) |>
filterRt(c(235, 250)) |>
filterMzValues(219.095, tolerance = 0.1) |>
ionCount()

plot(res, type = "l", xlab = "scan index")

Session information {-}

sessionInfo()


rformassspectrometry/SpectraQL documentation built on July 1, 2022, 5:51 p.m.