View source: R/Spectra-functions.R
concatenateSpectra | R Documentation |
Various functions are availabe to combine, aggregate or split data from one
of more Spectra
objects. These are:
c()
and concatenateSpectra()
: combines several Spectra
objects into
a single object. The resulting Spectra
contains all data from all
individual Spectra
, i.e. the union of all their spectra variables.
Concatenation will fail if the processing queue of any of the Spectra
objects is not empty or if different backends are used for the Spectra
objects. In such cases it is suggested to first change the backends of
all Spectra
to the same type of backend (using the setBackend()
function and to eventually (if needed) apply the processing queue using
the applyProcessing()
function.
combineSpectra()
: combines sets of spectra (defined with parameter f
)
into a single spectrum per set aggregating their MS data (i.e. their
peaks data matrices with the m/z and intensity values of their
mass peaks). The spectra variable values of the first spectrum per set
are reported for the combined spectrum. The peak matrices of the spectra
per set are combined using the function specified with parameter FUN
which uses by default the combinePeaksData()
function. See the
documentation of combinePeaksData()
for details on the aggregation of
the peak data and the package vignette for examples.
The sets of spectra can be specified with parameter f
which is expected
to be a factor
or vector
of length equal to the length of the
Spectra
specifying to which set a spectrum belongs to. The function
returns a Spectra
of length equal to the unique levels of f
. The
optional parameter p
allows to define how the Spectra
should be
split for potential parallel processing. The default is
p = x$dataStorage
and hence a per storage file parallel processing is
applied for Spectra
with on disk data representations (such as the
MsBackendMzR()
). This also prevents that spectra from different data
files/samples are combined (eventually use e.g. p = x$dataOrigin
or any
other spectra variables defining the originating samples for a spectrum).
Before combining the peaks data, all eventual present processing steps are
applied (by calling applyProcessing()
on the Spectra
). This function
will replace the original m/z and intensity values of a Spectra
hence
it can not be called on a Spectra
with a read-only backend. In such
cases, the backend should be changed to a writeable backend before
using the setBackend()
function (to e.g. a MsBackendMemory()
backend).
joinSpectraData()
: Individual spectra variables can be directly
added with the $<-
or [[<-
syntax. The joinSpectraData()
function allows to merge a DataFrame
to the existing spectra
data of a Spectra
. This function diverges from the merge()
method in
two main ways:
The by.x
and by.y
column names must be of length 1.
If variable names are shared in x
and y
, the spectra
variables of x
are not modified. It's only the y
variables that are appended with the suffix defined in
suffix.y
. This is to avoid modifying any core spectra
variables that would lead to an invalid object.
Duplicated Spectra keys (i.e. x[[by.x]]
) are not
allowed. Duplicated keys in the DataFrame
(i.e y[[by.y]]
)
throw a warning and only the last occurrence is kept. These
should be explored and ideally be removed using for
QFeatures::reduceDataFrame()
, PMS::reducePSMs()
or similar
functions.
split()
: splits the Spectra
object based on parameter f
into a list
of Spectra
objects.
concatenateSpectra(x, ...)
combineSpectra(
x,
f = x$dataStorage,
p = x$dataStorage,
FUN = combinePeaksData,
...,
BPPARAM = bpparam()
)
joinSpectraData(x, y, by.x = "spectrumId", by.y, suffix.y = ".y")
## S4 method for signature 'Spectra'
c(x, ...)
## S4 method for signature 'Spectra,ANY'
split(x, f, drop = FALSE, ...)
x |
A |
... |
Additional arguments. |
f |
For |
p |
For |
FUN |
For |
BPPARAM |
Parallel setup configuration. See |
y |
A |
by.x |
A |
by.y |
A |
suffix.y |
A |
drop |
For |
Sebastian Gibb, Johannes Rainer, Laurent Gatto
combinePeaks()
for functions to aggregate mass peaks data.
Spectra for a general description of the Spectra
object.
## Create a Spectra providing a `DataFrame` containing a MS data.
spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))
s <- Spectra(spd)
s
## Create a second Spectra from mzML files and use the `MsBackendMzR`
## on-disk backend.
sciex_file <- dir(system.file("sciex", package = "msdata"),
full.names = TRUE)
sciex <- Spectra(sciex_file, backend = MsBackendMzR())
sciex
## Subset to the first 100 spectra to reduce running time of the examples
sciex <- sciex[1:100]
## -------- COMBINE SPECTRA --------
## Combining the `Spectra` object `s` with the MS data from `sciex`.
## Calling directly `c(s, sciex)` would result in an error because
## both backends use a different backend. We thus have to first change
## the backends to the same backend. We change the backend of the `sciex`
## `Spectra` to a `MsBackendMemory`, the backend used by `s`.
sciex <- setBackend(sciex, MsBackendMemory())
## Combine the two `Spectra`
all <- c(s, sciex)
all
## The new `Spectra` objects contains the union of spectra variables from
## both:
spectraVariables(all)
## The spectra variables that were not present in `s`:
setdiff(spectraVariables(all), spectraVariables(s))
## The values for these were filled with missing values for spectra from
## `s`:
all$peaksCount |> head()
## -------- AGGREGATE SPECTRA --------
## Sets of spectra can be combined into a single, representative spectrum
## per set using `combineSpectra()`. This aggregates the peaks data (i.e.
## the spectra's m/z and intensity values) while using the values for all
## spectra variables from the first spectrum per set. Below we define the
## sets as all spectra measured in the *same second*, i.e. rounding their
## retention time to the next closer integer value.
f <- round(rtime(sciex))
head(f)
cmp <- combineSpectra(sciex, f = f)
## The length of `cmp` is now equal to the length of unique levels in `f`:
length(cmp)
## The spectra variable value from the first spectrum per set is used in
## the representative/combined spectrum:
cmp$rtime
## The peaks data was aggregated: the number of mass peaks of the first six
## spectra from the original `Spectra`:
lengths(sciex) |> head()
## and for the first aggreagated spectra:
lengths(cmp) |> head()
## The default peaks data aggregation method joins all mass peaks. See
## documentation of the `combinePeaksData()` function for more options.
## -------- SPLITTING DATA --------
## A `Spectra` can be split into a `list` of `Spectra` objects using the
## `split()` function defining the sets into which the `Spectra` should
## be splitted into with parameter `f`.
sciex_split <- split(sciex, f)
length(sciex_split)
sciex_split |> head()
## -------- ADDING SPECTRA DATA --------
## Adding new spectra variables
sciex1 <- filterDataOrigin(sciex, dataOrigin(sciex)[1])
spv <- DataFrame(spectrumId = sciex1$spectrumId[3:12], ## used for merging
var1 = rnorm(10),
var2 = sample(letters, 10))
spv
sciex2 <- joinSpectraData(sciex1, spv, by.y = "spectrumId")
spectraVariables(sciex2)
spectraData(sciex2)[1:13, c("spectrumId", "var1", "var2")]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.