deisotopeSpectra | R Documentation |
A variety of functions to filter or subset Spectra
objects are available.
These can be generally separated into two main classes: I) classical
subset operations that immediately reduce the number of spectra in the
object and II) filters that reduce the content of the object without
changing its length (i.e. the number of spectra). The latter can be further
subdivided into functions that affect the content of the spectraData
(i.e.
the general spectrum metadata) and those that reduce the content of the
object's peaksData
(i.e. the m/z and intensity values of a spectrum's
mass peaks).
A description of functions from these 3 different categories are given below
in sections Subset Spectra
, Filter content of spectraData()
and
Filter content of peaksData()
, respectively.
deisotopeSpectra(
x,
substDefinition = isotopicSubstitutionMatrix("HMDB_NEUTRAL"),
tolerance = 0,
ppm = 20,
charge = 1
)
reduceSpectra(x, tolerance = 0, ppm = 20)
filterPrecursorMaxIntensity(x, tolerance = 0, ppm = 20)
filterPrecursorIsotopes(
x,
tolerance = 0,
ppm = 20,
substDefinition = isotopicSubstitutionMatrix("HMDB_NEUTRAL")
)
filterPrecursorPeaks(
object,
tolerance = 0,
ppm = 20,
mz = c("==", ">="),
msLevel. = uniqueMsLevels(object)
)
## S4 method for signature 'Spectra'
dropNaSpectraVariables(object)
## S4 method for signature 'Spectra'
selectSpectraVariables(
object,
spectraVariables = union(spectraVariables(object), peaksVariables(object))
)
## S4 method for signature 'Spectra'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'Spectra'
filterAcquisitionNum(
object,
n = integer(),
dataStorage = character(),
dataOrigin = character()
)
## S4 method for signature 'Spectra'
filterEmptySpectra(object)
## S4 method for signature 'Spectra'
filterDataOrigin(object, dataOrigin = character())
## S4 method for signature 'Spectra'
filterDataStorage(object, dataStorage = character())
## S4 method for signature 'Spectra'
filterFourierTransformArtefacts(
object,
halfWindowSize = 0.05,
threshold = 0.2,
keepIsotopes = TRUE,
maxCharge = 5,
isotopeTolerance = 0.005
)
## S4 method for signature 'Spectra'
filterIntensity(
object,
intensity = c(0, Inf),
msLevel. = uniqueMsLevels(object),
...
)
## S4 method for signature 'Spectra'
filterIsolationWindow(object, mz = numeric())
## S4 method for signature 'Spectra'
filterMsLevel(object, msLevel. = integer())
## S4 method for signature 'Spectra'
filterMzRange(
object,
mz = numeric(),
msLevel. = uniqueMsLevels(object),
keep = TRUE
)
## S4 method for signature 'Spectra'
filterMzValues(
object,
mz = numeric(),
tolerance = 0,
ppm = 20,
msLevel. = uniqueMsLevels(object),
keep = TRUE
)
## S4 method for signature 'Spectra'
filterPolarity(object, polarity = integer())
## S4 method for signature 'Spectra'
filterPrecursorMz(object, mz = numeric())
## S4 method for signature 'Spectra'
filterPrecursorMzRange(object, mz = numeric())
## S4 method for signature 'Spectra'
filterPrecursorMzValues(object, mz = numeric(), ppm = 20, tolerance = 0)
## S4 method for signature 'Spectra'
filterPrecursorCharge(object, z = integer())
## S4 method for signature 'Spectra'
filterPrecursorScan(object, acquisitionNum = integer(), f = dataOrigin(object))
## S4 method for signature 'Spectra'
filterRt(object, rt = numeric(), msLevel. = uniqueMsLevels(object))
## S4 method for signature 'Spectra'
filterRanges(
object,
spectraVariables = character(),
ranges = numeric(),
match = c("all", "any")
)
## S4 method for signature 'Spectra'
filterValues(
object,
spectraVariables = character(),
values = numeric(),
ppm = 0,
tolerance = 0,
match = c("all", "any")
)
x |
|
substDefinition |
For |
tolerance |
For |
ppm |
For |
charge |
For |
object |
|
mz |
For |
msLevel. |
|
spectraVariables |
For |
i |
For |
j |
For |
... |
Additional arguments. |
drop |
For |
n |
for |
dataStorage |
For |
dataOrigin |
For |
halfWindowSize |
For |
threshold |
For |
keepIsotopes |
For |
maxCharge |
For |
isotopeTolerance |
For |
intensity |
For |
keep |
For |
polarity |
for |
z |
For |
acquisitionNum |
for |
f |
For |
rt |
for |
ranges |
for |
match |
For |
values |
for |
Spectra
These functions affect the number of spectra in a Spectra
object creating
a subset of the original object without affecting its content.
[
: subsets the spectra keeping only selected elements (i
). The method
always returns a Spectra
object.
filterAcquisitionNum()
: filters the object keeping only spectra matching
the provided acquisition numbers (argument n
). If dataOrigin
or
dataStorage
is also provided, object
is subsetted to the spectra with
an acquisition number equal to n
in spectra with matching dataOrigin
or dataStorage values retaining all other spectra.
Returns the filtered Spectra
.
filterDataOrigin()
: filters the object retaining spectra matching the
provided dataOrigin
. Parameter dataOrigin
has to be of type
character
and needs to match exactly the data origin value of the
spectra to subset.
Returns the filtered Spectra
object (with spectra ordered according to
the provided dataOrigin
parameter).
filterDataStorage()
: filters the object retaining spectra stored in the
specified dataStorage
. Parameter dataStorage
has to be of type
character
and needs to match exactly the data storage value of the
spectra to subset.
Returns the filtered Spectra
object (with spectra ordered according to
the provided dataStorage
parameter).
filterEmptySpectra()
: removes empty spectra (i.e. spectra without peaks).
Returns the filtered Spectra
object (with spectra in their
original order).
filterIsolationWindow()
: retains spectra that contain mz
in their
isolation window m/z range (i.e. with an isolationWindowLowerMz
<= mz
and isolationWindowUpperMz
>= mz
. Returns the filtered Spectra
object (with spectra in their original order).
filterMsLevel()
: filters object by MS level keeping only spectra matching
the MS level specified with argument msLevel
. Returns the filtered
Spectra
(with spectra in their original order).
filterPolarity()
: filters the object keeping only spectra matching the
provided polarity. Returns the filtered Spectra
(with spectra in their
original order).
filterPrecursorCharge()
: retains spectra with the defined precursor
charge(s).
filterPrecursorIsotopes()
: groups MS2 spectra based on their precursor
m/z and precursor intensity into predicted isotope groups and keep for each
only the spectrum representing the monoisotopic precursor. MS1 spectra
are returned as is. See documentation for deisotopeSpectra()
below for
details on isotope prediction and parameter description.
filterPrecursorMaxIntensity()
: filters the Spectra
keeping for groups
of (MS2) spectra with similar precursor m/z values (given parameters
ppm
and tolerance
) the one with the highest precursor intensity. The
function filters only MS2 spectra and returns all MS1 spectra. If
precursor intensities are NA
for all spectra within a spectra group, the
first spectrum of that groups is returned.
Note: some manufacturers don't provide precursor intensities. These can
however also be estimated with estimatePrecursorIntensity()
.
filterPrecursorMzRange()
(previously filterPrecursorMz()
which is now
deprecated): retains spectra with a precursor m/z within the
provided m/z range. See examples for details on selecting spectra with
a precursor m/z for a target m/z accepting a small difference in ppm.
filterPrecursorMzValues()
: retains spectra with precursor m/z matching
any of the provided m/z values (given ppm
and tolerance
). Spectra with
missing precursor m/z value (e.g. MS1 spectra) are dropped.
filterPrecursorScan()
: retains parent (e.g. MS1) and children scans (e.g.
MS2) of acquisition number acquisitionNum
. Returns the filtered
Spectra
(with spectra in their original order). Parameter f
allows to
define which spectra belong to the same sample or original data file (
defaults to f = dataOrigin(object)
).
filterRanges()
: allows filtering of the Spectra
object based on user
defined numeric ranges (parameter ranges
) for one or more available
spectra variables in object (spectra variable names can be specified with
parameter spectraVariables
). Spectra for which the value of a spectra
variable is within it's defined range are retained. If multiple
ranges/spectra variables are defined, the match
parameter can be used
to specify whether all conditions (match = "all"
; the default) or if
any of the conditions must match (match = "any"
; all spectra for which
values are within any of the provided ranges are retained).
filterRt()
: retains spectra of MS level msLevel
with retention
times (in seconds) within (>=
) rt[1]
and (<=
)
rt[2]
. Returns the filtered Spectra
(with spectra in their
original order).
filterValues()
: allows filtering of the Spectra
object based on
similarities of numeric values of one or more spectraVariables(object)
(parameter spectraVariables
) to provided values (parameter values
)
given acceptable differences (parameters tolerance and ppm). If multiple
values/spectra variables are defined, the match
parameter can be used
to specify whether all conditions (match = "all"
; the default) or if
any of the conditions must match (match = "any"
; all spectra for which
values are within any of the provided ranges are retained).
spectraData()
The functions described in this section filter the content from a
Spectra
's spectra data, i.e. affect values of, or complete, spectra
variables. None of these functions reduces the object's number of spectra.
dropNaSpectraVariables()
: removes spectra variables (i.e. columns in the
object's spectraData
that contain only missing values (NA
). Note that
while columns with only NA
s are removed, a spectraData()
call after
dropNaSpectraVariables()
might still show columns containing NA
values
for core spectra variables. The total number of spectra is not changed
by this function.
selectSpectraVariables()
: reduces the information within the object to
the selected spectra variables: all data for variables not specified will
be dropped. For mandatory columns (i.e., those listed by
coreSpectraVariables()
, such as msLevel, rtime ...) only
the values will be dropped but not the variable itself. Additional (or
user defined) spectra variables will be completely removed.
Returns the filtered Spectra
.
peaksData()
The functions described in this section filter the content of the
Spectra
's peaks data, i.e. either the number or the values (m/z or
intensity values) of the mass peaks. Also, the actual operation is only
executed once peaks data is accessed (through peaksData()
,
mz()
or intensity()
) or applyProcessing()
is called.
These operations don't affect the number of spectra in the Spectra
object.
deisotopeSpectra()
: deisotopes each spectrum keeping only the
monoisotopic peak for groups of isotopologues. Isotopologues are
estimated using the MetaboCoreUtils::isotopologues()
function from the
MetaboCoreUtils package. Note that
the default parameters for isotope prediction/detection have been
determined using data from the Human Metabolome Database (HMDB) and
isotopes for elements other than CHNOPS might not be detected. See
parameter substDefinition
in the documentation of
MetaboCoreUtils::isotopologues()
for
more information. The approach and code to define the parameters for
isotope prediction is described
here.
filterFourierTransformArtefacts()
: removes (Orbitrap) fast fourier
artefact peaks from spectra (see examples below). The function iterates
through all intensity ordered peaks in a spectrum and removes all peaks
with an m/z within +/- halfWindowSize
of the current peak if their
intensity is lower than threshold
times the current peak's intensity.
Additional parameters keepIsotopes
, maxCharge
and isotopeTolerance
allow to avoid removing of potential [13]C
isotope peaks (maxCharge
being the maximum charge that should be considered and isotopeTolerance
the absolute acceptable tolerance for matching their m/z).
See filterFourierTransformArtefacts()
for details and background and
deisitopeSpectra()
for an alternative.
filterIntensity()
: filters mass peaks in each spectrum keeping only
those with intensities that are within the provided range or match the
criteria of the provided function. For the former, parameter intensity
has to be a numeric
defining the intensity range, for the latter a
function
that takes the intensity values of the spectrum and returns
a logical
whether the peak should be retained or not (see examples
below for details) - additional parameters to the function can be passed
with ...
.
To remove only peaks with intensities below a certain threshold, say
100, use intensity = c(100, Inf)
. Note: also a single value can be
passed with the intensity
parameter in which case an upper limit of
Inf
is used.
Note that this function removes also peaks with missing intensities
(i.e. an intensity of NA
). Parameter msLevel.
allows to restrict the
filtering to spectra of the specified MS level(s).
filterMzRange()
: filters mass peaks in the object keeping or removing
those in each spectrum that are within the provided m/z range. Whether
peaks are retained or removed can be configured with parameter keep
(default keep = TRUE
).
filterMzValues()
: filters mass peaks in the object keeping all
peaks in each spectrum that match the provided m/z value(s) (for
keep = TRUE
, the default) or removing all of them (for keep = FALSE
).
The m/z matching considers also the absolute tolerance
and m/z-relative
ppm
values. tolerance
and ppm
have to be of length 1.
filterPeaksRanges()
: filters mass peaks of a Spectra
object using any
set of range-based filters on numeric spectra or peaks variables. See
filterPeaksRanges()
for more information.
filterPrecursorPeaks()
: removes peaks from each spectrum in object
with
an m/z equal or larger than the m/z of the precursor, depending on the
value of parameter mz
: for mz = ==" (the default) peaks with matching m/z (considering an absolute and relative acceptable difference depending on
toleranceand
ppm, respectively) are removed. For
mz = ">="all peaks with an m/z larger or equal to the precursor m/z (minus
toleranceand the
ppmof the precursor m/z) are removed. Parameter
msLevel.allows to restrict the filter to certain MS levels (by default the filter is applied to all MS levels). Note that no peaks are removed if the precursor m/z is
NA' (e.g. typically for MS1 spectra).
reduceSpectra()
: keeps for groups of peaks with similar m/z values in
(given ppm
and tolerance
) in each spectrum only the mass peak with the
highest intensity removing all other peaks hence reducing each
spectrum to the highest intensity peaks per peak group.
Peak groups are defined using the group()
function from the
MsCoreUtils package. See also the combinePeaks()
function for an
alternative function to combine peaks within each spectrum.
Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail, Nir Shahaf
combineSpectra()
for functions to combine or aggregate Spectra
.
combinePeaks()
for functions to combine or aggregate a Spectra
's
peaksData()
## Load a `Spectra` object with LC-MS/MS data.
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
sps_dda <- Spectra(fl)
sps_dda
## -------- SUBSET SPECTRA --------
## Subset to the first 3 spectra
tmp <- sps_dda[1:3]
tmp
length(tmp)
## Subset to all MS2 spectra; this could be done with [, or, more
## efficiently, with the `filterMsLevel` function:
sps_dda[msLevel(sps_dda) == 2L]
filterMsLevel(sps_dda, 2L)
## Filter the object keeping only MS2 spectra with an precursor m/z value
## between a specified range:
filterPrecursorMzRange(sps_dda, c(80, 90))
## Filter the object to MS2 spectra with an precursor m/z matching a
## pre-defined value (given ppm and tolerance)
filterPrecursorMzValues(sps_dda, 85, ppm = 5, tolerance = 0.1)
## The `filterRanges()` function allows to filter a `Spectra` based on
## numerical ranges of any of its (numerical) spectra variables.
## First, determine the variable(s) on which to base the filtering:
sv <- c("rtime", "precursorMz", "peaksCount")
## Note that ANY variables can be chosen here, and as many as wanted.
## Define the ranges (pairs of values with lower and upper boundary) to be
## used for the individual spectra variables. The first two values will be
## used for the first spectra variable (e.g., `"rtime"` here), the next two
## for the second (e.g. `"precursorMz"` here) and so on:
ranges <- c(30, 350, 200, 500, 350, 600)
## Input the parameters within the filterRanges function:
filt_spectra <- filterRanges(sps_dda, spectraVariables = sv,
ranges = ranges)
filt_spectra
## `filterRanges()` can also be used to filter a `Spectra` object with
## multiple ranges for the same `spectraVariable` (e.g, here `"rtime"`)
sv <- c("rtime", "rtime")
ranges <- c(30, 100, 200, 300)
filt_spectra <- filterRanges(sps_dda, spectraVariables = sv,
ranges = ranges, match = "any")
filt_spectra
## While `filterRanges()` filtered on numeric ranges, `filterValues()`
## allows to filter an object matching spectra variable values to user
## provided values (allowing to configure allowed differences using the
## `ppm` and `tolerance` parameters).
## First determine the variable(s) on which to base the filtering:
sv <- c("rtime", "precursorMz")
## Note that ANY variables can be chosen here, and as many as wanted.
## Define the values that will be used to filter the spectra based on their
## similarities to their respective `spectraVariables`.
## The first values in the parameters values, tolerance and ppm will be
## used for the first spectra variable (e.g. `"rtime"` here), the next for
## the second (e.g. `"precursorMz"` here) and so on:
values <- c(350, 80)
tolerance <- c(100, 0.1)
ppm <- c(0, 50)
## Input the parameters within the `filterValues()` function:
filt_spectra <- filterValues(sps_dda, spectraVariables = sv,
values = values, tolerance = tolerance, ppm = ppm)
filt_spectra
## -------- FILTER SPECTRA DATA --------
## Remove spectra variables without content (i.e. with only missing values)
sps_noNA <- dropNaSpectraVariables(sps_dda)
## This reduced the size of the object slightly
print(object.size(sps_dda), unit = "MB")
print(object.size(sps_noNA), unit = "MB")
## With the `selectSpectraVariables()` function it is in addition possible
## to subset the data of a `Spectra` to the selected columns/variables,
## keeping only their data:
tmp <- selectSpectraVariables(sps_dda, c("msLevel", "mz", "intensity",
"scanIndex"))
print(object.size(tmp), units = "MB")
## Except the selected variables, all data is now removed. Accessing
## core spectra variables still works, but returns only NA
rtime(tmp) |> head()
## -------- FILTER PEAKS DATA --------
## `filterMzValues()` filters the mass peaks data of a `Spectra` retaining
## only those mass peaks with an m/z value matching the provided value(s).
sps_sub <- filterMzValues(sps_dda, mz = c(103, 104), tolerance = 0.3)
## The filtered `Spectra` has the same length
length(sps_dda)
length(sps_sub)
## But the number of mass peaks changed
lengths(sps_dda) |> head()
lengths(sps_sub) |> head()
## This function can also be used to remove specific peaks from a spectrum
## by setting `keep = FALSE`.
sps_sub <- filterMzValues(sps_dda, mz = c(103, 104),
tolerance = 0.3, keep = FALSE)
lengths(sps_sub) |> head()
## With the `filterMzRange()` function it is possible to keep (or remove)
## mass peaks with m/z values within a specified numeric range.
sps_sub <- filterMzRange(sps_dda, mz = c(100, 150))
lengths(sps_sub) |> head()
## See also the `filterPeaksRanges()` function for a more flexible framework
## to filter mass peaks
## Removing fourier transform artefacts seen in Orbitra data.
## Loading an Orbitrap spectrum with artefacts.
data(fft_spectrum)
plotSpectra(fft_spectrum, xlim = c(264.5, 265.5))
plotSpectra(fft_spectrum, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
fft_spectrum <- filterFourierTransformArtefacts(fft_spectrum)
fft_spectrum
plotSpectra(fft_spectrum, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
## Using a few examples peaks in your data you can optimize the parameters
fft_spectrum_filtered <- filterFourierTransformArtefacts(fft_spectrum,
halfWindowSize = 0.2,
threshold = 0.005,
keepIsotopes = TRUE,
maxCharge = 5,
isotopeTolerance = 0.005
)
fft_spectrum_filtered
length(mz(fft_spectrum_filtered)[[1]])
plotSpectra(fft_spectrum_filtered, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
## *Reducing* a `Spectra` keeping for groups of mass peaks (characterized
## by similarity of their m/z values) only one representative peak. This
## function helps cleaning fragment spectra.
## Filter the data set to MS2 spectra
ms2 <- filterMsLevel(sps_dda, 2L)
## For groups of fragment peaks with a difference in m/z < 0.1, keep only
## the largest one.
ms2_red <- reduceSpectra(ms2, ppm = 0, tolerance = 0.1)
lengths(ms2) |> tail()
lengths(ms2_red) |> tail()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.