applyProcessing | R Documentation |
Various data analysis functions are available for Spectra
objects. These
can be categorized into functions that either return a Spectra
object
(with the manipulated data) and functions that directly return the
result from the calculation. For the former category, the data manipulations
are cached in the result object's processing queue and only exectuted
on-the-fly when the respective data gets extracted from the Spectra
(see
section The processing queue for more information).
For the second category, the calculations are directly executed and the result, usually one value per spectrum, returned. Generally, to reduce memory demand, a chunk-wise processing of the data is performed.
applyProcessing(
object,
f = processingChunkFactor(object),
BPPARAM = bpparam(),
...
)
processingLog(x)
scalePeaks(x, by = sum, msLevel. = uniqueMsLevels(x))
## S4 method for signature 'Spectra'
addProcessing(object, FUN, ..., spectraVariables = character())
## S4 method for signature 'Spectra'
bin(
x,
binSize = 1L,
breaks = NULL,
msLevel. = uniqueMsLevels(x),
FUN = sum,
zero.rm = TRUE
)
## S4 method for signature 'Spectra'
containsMz(
object,
mz = numeric(),
tolerance = 0,
ppm = 20,
which = c("any", "all"),
BPPARAM = bpparam()
)
## S4 method for signature 'Spectra'
containsNeutralLoss(
object,
neutralLoss = 0,
tolerance = 0,
ppm = 20,
BPPARAM = bpparam()
)
## S4 method for signature 'Spectra'
entropy(object, normalized = TRUE)
## S4 method for signature 'ANY'
entropy(object, ...)
## S4 method for signature 'Spectra'
pickPeaks(
object,
halfWindowSize = 2L,
method = c("MAD", "SuperSmoother"),
snr = 0,
k = 0L,
descending = FALSE,
threshold = 0,
msLevel. = uniqueMsLevels(object),
...
)
## S4 method for signature 'Spectra'
replaceIntensitiesBelow(
object,
threshold = min,
value = 0,
msLevel. = uniqueMsLevels(object)
)
## S4 method for signature 'Spectra'
reset(object, ...)
## S4 method for signature 'Spectra'
smooth(
x,
halfWindowSize = 2L,
method = c("MovingAverage", "WeightedMovingAverage", "SavitzkyGolay"),
msLevel. = uniqueMsLevels(x),
...
)
## S4 method for signature 'Spectra'
spectrapply(
object,
FUN,
...,
chunkSize = integer(),
f = factor(),
BPPARAM = SerialParam()
)
object |
A |
f |
For |
BPPARAM |
Parallel setup configuration. See |
... |
Additional arguments passed to internal and downstream functions. |
x |
A |
by |
For |
msLevel. |
|
FUN |
For |
spectraVariables |
For |
binSize |
For |
breaks |
For |
zero.rm |
For |
mz |
For |
tolerance |
For |
ppm |
For |
which |
For |
neutralLoss |
for |
normalized |
for |
halfWindowSize |
For |
method |
For |
snr |
For |
k |
For |
descending |
For |
threshold |
For |
value |
For |
chunkSize |
For |
See the documentation of the individual functions for a description of the return value.
Spectra
The methods listed here return a Spectra
object as a result.
addProcessing()
: adds an arbitrary function that should be applied to the
peaks matrix of every spectrum in object
. The function (can be passed
with parameter FUN
) is expected to take a peaks matrix as input and to
return a peaks matrix. A peaks matrix is a numeric matrix with two columns,
the first containing the m/z values of the peaks and the second the
corresponding intensities. The function has to have ...
in its
definition. Additional arguments can be passed with ...
. With parameter
spectraVariables
it is possible to define additional spectra variables
from object
that should be passed to the function FUN
. These will be
passed by their name (e.g. specifying spectraVariables = "precursorMz"
will pass the spectra's precursor m/z as a parameter named precursorMz
to the function. The only exception is the spectra's MS level, these will
be passed to the function as a parameter called spectrumMsLevel
(i.e.
with spectraVariables = "msLevel"
the MS levels of each spectrum will be
submitted to the function as a parameter called spectrumMsLevel
).
Examples are provided in the package vignette.
bin()
: aggregates individual spectra into discrete (m/z) bins. Binning is
performed only on spectra of the specified MS level(s) (parameter
msLevel
, by default all MS levels of x
). The bins can be defined with
parameter breaks
which by default are equally sized bins, with size
being defined by parameter binSize
, from the minimal to the maximal m/z
of all spectra (of MS level msLevel
) within x
. The same bins are used
for all spectra in x
. All intensity values for peaks falling into the
same bin are aggregated using the function provided with parameter FUN
(defaults to FUN = sum
, i.e. all intensities are summed up). Note that
the binning operation is applied to the peak data on-the-fly upon data
access and it is possible to revert the operation with the reset()
function (see description of reset()
below).
countIdentifications
: counts the number of identifications each scan has
led to. See countIdentifications()
for more details.
pickPeaks()
: picks peaks on individual spectra using a moving
window-based approach (window size = 2 * halfWindowSize
). For noisy
spectra there are currently two different noise estimators available,
the Median Absolute Deviation (method = "MAD"
) and
Friedman's Super Smoother (method = "SuperSmoother"
),
as implemented in the MsCoreUtils::noise()
.
The method supports also to optionally refine the m/z value of
the identified centroids by considering data points that belong (most
likely) to the same mass peak. Therefore the m/z value is calculated as an
intensity weighted average of the m/z values within the peak region.
The peak region is defined as the m/z values (and their respective
intensities) of the 2 * k
closest signals to the centroid or the closest
valleys (descending = TRUE
) in the 2 * k
region. For the latter the k
has to be chosen general larger. See MsCoreUtils::refineCentroids()
for
details.
If the ratio of the signal to the highest intensity of the peak is below
threshold
it will be ignored for the weighted average.
replaceIntensitiesBelow()
: replaces intensities below a specified
threshold with the provided value
. Parameter threshold
can be either
a single numeric value or a function which is applied to all non-NA
intensities of each spectrum to determine a threshold value for each
spectrum. The default is threshold = min
which replaces all values
which are <= the minimum intensity in a spectrum with value
(the
default for value
is 0
). Note that the function specified with
threshold
is expected to have a parameter na.rm
since na.rm = TRUE
will be passed to the function. If the spectrum is in profile mode,
ranges of successive non-0 peaks <= threshold
are set to 0.
Parameter msLevel.
allows to apply this to only spectra of certain MS
level(s).
scalePeaks()
: scales intensities of peaks within each spectrum depending
on parameter by
. With by = sum
(the default) peak intensities are
divided by the sum of peak intensities within each spectrum. The sum of
intensities is thus 1 for each spectrum after scaling. Parameter
msLevel.
allows to apply the scaling of spectra of a certain MS level.
By default (msLevel. = uniqueMsLevels(x)
) intensities for all
spectra will be scaled.
smooth()
: smooths individual spectra using a moving window-based approach
(window size = 2 * halfWindowSize
). Currently, the
Moving-Average- (method = "MovingAverage"
),
Weighted-Moving-Average- (method = "WeightedMovingAverage")
,
weights depending on the distance of the center and calculated
1/2^(-halfWindowSize:halfWindowSize)
) and
Savitzky-Golay-Smoothing (method = "SavitzkyGolay"
) are supported.
For details how to choose the correct halfWindowSize
please see
MsCoreUtils::smooth()
.
The functions listed in this section return immediately the result from the calculation. To reduce memory demand (and allow parallel processing) the calculations a chunk-wise processing is generally performed.
chunkapply()
: apply an arbitrary function to chunks of spectra. See
chunkapply()
for details and examples.
containsMz()
: checks for each of the spectra whether they contain mass
peaks with an m/z equal to mz
(given acceptable difference as defined by
parameters tolerance
and ppm
- see MsCoreUtils::common()
for
details). Parameter
which
allows to define whether any (which = "any"
, the default) or
all (which = "all"
) of the mz
have to match. The function returns
NA
if mz
is of length 0 or is NA
.
containsNeutralLoss()
: checks for each spectrum in object
if it has a
peak with an m/z value equal to its precursor m/z - neutralLoss
(given
acceptable difference as defined by parameters tolerance
and ppm
).
Returns NA
for MS1 spectra (or spectra without a precursor m/z).
entropy()
: calculates the entropy of each spectra based on the metrics
suggested by Li et al. (https://doi.org/10.1038/s41592-021-01331-z).
See also MsCoreUtils::nentropy()
in the MsCoreUtils package for
details.
estimatePrecursorIntensity()
: defines the precursor intensities for MS2
spectra using the intensity of the matching MS1 peak from the
closest MS1 spectrum (i.e. the last MS1 spectrum measured before the
respective MS2 spectrum). With method = "interpolation"
it is also
possible to calculate the precursor intensity based on an interpolation of
intensity values (and retention times) of the matching MS1 peaks from the
previous and next MS1 spectrum. See estimatePrecursorIntensity()
for
examples and more details.
estimatePrecursorMz()
: for DDA data: allows to estimate a fragment
spectra's precursor m/z based on the reported precursor m/z and the data
from the previous MS1 spectrum. See estimatePrecursorMz()
for details.
neutralLoss()
: calculates neutral loss spectra for fragment spectra. See
neutralLoss()
for detailed documentation.
spectrapply()
: applies a given function to each individual spectrum or
sets of a Spectra
object. By default, the Spectra
is split into
individual spectra (i.e. Spectra
of length 1) and the function FUN
is applied to each of them. An alternative splitting can be defined with
parameter f
. Parameters for FUN
can be passed using ...
.
The returned result and its order depend on the function FUN
and how
object
is split (hence on f
, if provided). Parallel processing is
supported and can be configured with parameter BPPARAM
, is however only
suggested for computational intense FUN
.
As an alternative to the (eventual parallel) processing of the full
Spectra
, spectrapply()
supports also a chunk-wise processing. For this,
parameter chunkSize
needs to be specified. object
is then split into
chunks of size chunkSize
which are then (stepwise) processed by FUN
.
This guarantees a lower memory demand (especially for on-disk backends)
since only the data for one chunk needs to be loaded into memory in each
iteration. Note that by specifying chunkSize
, parameters f
and
BPPARAM
will be ignored.
See also chunkapply()
above or examples below for details on chunk-wise
processing.
Operations that modify mass peak data, i.e. the m/z and intensity values of
a Spectra
are generally not applied immediately to the data but are
cached within the object's processing queue. These operations are then
applied to the data only upon request, for example when m/z and/or
intensity values are extracted. This lazy execution guarantees that the
same functionality can be applied to any Spectra
object, regardless of
the type of backend that is used. Thus, data manipulation operations can
also be applied to data that is read only. As a side effect, this enables
also to undo operations using the reset()
function.
Functions related to the processing queue are:
applyProcessing()
: for Spectra
objects that use a writeable backend
only: apply all steps from the lazy processing queue to the peak data and
write it back to the data storage. Parameter f
allows to specify how
object
should be split for parallel processing. This should either be
equal to the dataStorage
, or f = rep(1, length(object))
to disable
parallel processing alltogether. Other partitionings might result in
errors (especially if a MsBackendHdf5Peaks
backend is used).
processingLog()
: returns a character
vector with the processing log
messages.
reset()
: restores the data to its original state (as much as possible):
removes any processing steps from the lazy processing queue and calls
reset()
on the backend which, depending on the backend, can also undo
e.g. data filtering operations. Note that a reset*(
call after
applyProcessing()
will not have any effect. See examples below for more
information.
Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail, Nir Shahaf, Mar Garcia-Aloy
compareSpectra()
for calculation of spectra similarity scores.
processingChunkSize()
for information on parallel and chunk-wise data
processing.
Spectra for a general description of the Spectra
object.
## Load a `Spectra` object with LC-MS/MS data.
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
sps_dda <- Spectra(fl)
sps_dda
## -------- FUNCTIONS RETURNING A SPECTRA --------
## Replace peak intensities below 40 with a value of 1
sps_mod <- replaceIntensitiesBelow(sps_dda, threshold = 20, value = 1)
sps_mod
## Get the intensities of the first spectrum before and after the
## operation
intensity(sps_dda[1])
intensity(sps_mod[1])
## Remove all peaks with an intensity below 5.
sps_mod <- filterIntensity(sps_dda, intensity = c(5, Inf))
intensity(sps_mod)
## In addition it is possible to pass a function to `filterIntensity()`: in
## the example below we want to keep only peaks that have an intensity which
## is larger than one third of the maximal peak intensity in that spectrum.
keep_peaks <- function(x, prop = 3) {
x > max(x, na.rm = TRUE) / prop
}
sps_mod <- filterIntensity(sps_dda, intensity = keep_peaks)
intensity(sps_mod)
## We can also change the proportion by simply passing the `prop` parameter
## to the function. To keep only peaks that have an intensity which is
## larger than half of the maximum intensity:
sps_mod <- filterIntensity(sps_dda, intensity = keep_peaks, prop = 2)
intensity(sps_mod)
## With the `scalePeaks()` function we can alternatively scale the
## intensities of mass peaks per spectrum to relative intensities. This
## is specifically useful for fragment (MS2) spectra. We below thus
## scale the intensities per spectrum by the total sum of intensities
## (such that the sum of all intensities per spectrum is 1).
## Below we scale the intensities of all MS2 spectra in our data set.
sps_mod <- scalePeaks(sps_dda, msLevel = 2L)
## MS1 spectra were not affected
sps_mod |>
filterMsLevel(1L) |>
intensity()
## Intensities of MS2 spectra were scaled
sps_mod |>
filterMsLevel(2L) |>
intensity()
## Since data manipulation operations are by default not directly applied to
## the data but only cached in the internal processing queue, it is also
## possible to remove these data manipulations with the `reset()` function:
tmp <- reset(sps_mod)
tmp
lengths(sps_dda) |> head()
lengths(sps_mod) |> head()
lengths(tmp) |> head()
## Data manipulation operations cached in the processing queue can also be
## applied to the mass peaks data with the `applyProcessing()` function, if
## the `Spectra` uses a backend that supports that (i.e. allows replacing
## the mass peaks data). Below we first change the backend to a
## `MsBackendMemory()` and then use the `applyProcessing()` to modify the
## mass peaks data
sps_dda <- setBackend(sps_dda, MsBackendMemory())
sps_mod <- filterIntensity(sps_dda, intensity = c(5, Inf))
sps_mod <- applyProcessing(sps_mod)
sps_mod
## While we can't *undo* this filtering operation now using the `reset()`
## function, accessing the data would now be faster, because the operation
## does no longer to be applied to the original data before returning to the
## user.
## -------- FUNCTIONS RETURNING THE RESULT --------
## With the `spectrapply()` function it is possible to apply an
## arbitrary function to each spectrum in a Spectra.
## In the example below we calculate the mean intensity for each spectrum
## in a subset of the sciex_im data. Note that we can access all variables
## of each individual spectrum either with the `$` operator or the
## corresponding method.
res <- spectrapply(sps_dda[1:20], FUN = function(x) mean(x$intensity[[1]]))
head(res)
## As an alternative, applying a function `FUN` to a `Spectra` can be
## performed *chunk-wise*. The advantage of this is, that only the data for
## one chunk at a time needs to be loaded into memory reducing the memory
## demand. This type of processing can be performed by specifying the size
## of the chunks (i.e. number of spectra per chunk) with the `chunkSize`
## parameter
spectrapply(sps_dda[1:20], lengths, chunkSize = 5L)
## Precursor intensity estimation. Some manufacturers don't report the
## precursor intensity for MS2 spectra:
sps_dda |>
filterMsLevel(2L) |>
precursorIntensity()
## This intensity can however be estimated from the previously measured
## MS1 scan with the `estimatePrecursorIntensity()` function:
pi <- estimatePrecursorIntensity(sps_dda)
## This function returned the result as a `numeric` vector with one
## value per spectrum:
pi
## We can replace the precursor intensity values of the originating
## object:
sps_dda$precursorIntensity <- pi
sps_dda |>
filterMsLevel(2L) |>
precursorIntensity()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.