combineSpectra: Merging, aggregating and splitting Spectra

View source: R/Spectra-functions.R

concatenateSpectraR Documentation

Merging, aggregating and splitting Spectra

Description

Various functions are availabe to combine, aggregate or split data from one of more Spectra objects. These are:

  • c() and concatenateSpectra(): combines several Spectra objects into a single object. The resulting Spectra contains all data from all individual Spectra, i.e. the union of all their spectra variables. Concatenation will fail if the processing queue of any of the Spectra objects is not empty or if different backends are used for the Spectra objects. In such cases it is suggested to first change the backends of all Spectra to the same type of backend (using the setBackend() function and to eventually (if needed) apply the processing queue using the applyProcessing() function.

  • cbind2(): Appends multiple spectra variables from a data.frame, DataFrame or matrix to the Spectra object at once. The order of the values (rows) in y has to match the order of spectra in x. The function does not allow to replace existing spectra variables. cbind2() returns a Spectra object with the appended spectra variables. For a more controlled way of adding spectra variables, see the joinSpectraData() function.

  • combineSpectra(): combines sets of spectra (defined with parameter f) into a single spectrum per set aggregating their MS data (i.e. their peaks data matrices with the m/z and intensity values of their mass peaks). The spectra variable values of the first spectrum per set are reported for the combined spectrum. The peak matrices of the spectra per set are combined using the function specified with parameter FUN which uses by default the combinePeaksData() function. See the documentation of combinePeaksData() for details on the aggregation of the peak data and the package vignette for examples. The sets of spectra can be specified with parameter f which is expected to be a factor or vector of length equal to the length of the Spectra specifying to which set a spectrum belongs to. The function returns a Spectra of length equal to the unique levels of f. The optional parameter p allows to define how the Spectra should be split for potential parallel processing. The default is p = x$dataStorage and hence a per storage file parallel processing is applied for Spectra with on disk data representations (such as the MsBackendMzR()). This also prevents that spectra from different data files/samples are combined (eventually use e.g. p = x$dataOrigin or any other spectra variables defining the originating samples for a spectrum). Before combining the peaks data, all eventual present processing steps are applied (by calling applyProcessing() on the Spectra). This function will replace the original m/z and intensity values of a Spectra hence it can not be called on a Spectra with a read-only backend. In such cases, the backend should be changed to a writeable backend before using the setBackend() function (to e.g. a MsBackendMemory() backend).

  • joinSpectraData(): Individual spectra variables can be directly added with the ⁠$<-⁠ or ⁠[[<-⁠ syntax. The joinSpectraData() function allows to merge a DataFrame to the existing spectra data of a Spectra. This function diverges from the merge() method in two main ways:

    • The by.x and by.y column names must be of length 1.

    • If variable names are shared in x and y, the spectra variables of x are not modified. It's only the y variables that are appended with the suffix defined in suffix.y. This is to avoid modifying any core spectra variables that would lead to an invalid object.

    • Duplicated Spectra keys (i.e. x[[by.x]]) are not allowed. Duplicated keys in the DataFrame (i.e y[[by.y]]) throw a warning and only the last occurrence is kept. These should be explored and ideally be removed using for QFeatures::reduceDataFrame(), PMS::reducePSMs() or similar functions. For a more general function that allows to append data.frame, DataFrame and matrix see cbind2().

  • split(): splits the Spectra object based on parameter f into a list of Spectra objects.

Usage

concatenateSpectra(x, ...)

combineSpectra(
  x,
  f = x$dataStorage,
  p = x$dataStorage,
  FUN = combinePeaksData,
  ...,
  BPPARAM = bpparam()
)

joinSpectraData(x, y, by.x = "spectrumId", by.y, suffix.y = ".y")

## S4 method for signature 'Spectra'
c(x, ...)

## S4 method for signature 'Spectra,dataframeOrDataFrameOrmatrix'
cbind2(x, y, ...)

## S4 method for signature 'Spectra,ANY'
split(x, f, drop = FALSE, ...)

Arguments

x

A Spectra object.

...

Additional arguments.

f

For split(): factor defining how to split x. See base::split() for details. For combineSpectra(): factor defining the grouping of the spectra that should be combined. Defaults to x$dataStorage.

p

For combineSpectra(): factor defining how to split the input Spectra for parallel processing. Defaults to x$dataStorage, i.e., depending on the used backend, per-file parallel processing will be performed.

FUN

For combineSpectra(): function to combine the (peak matrices) of the spectra. Defaults to combinePeaksData().

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information. This is passed directly to the backendInitialize() method of the MsBackend.

y

For joinSpectraData(): DataFrame with the spectra variables to join/add. For cbind2(): a data.frame, DataFrame or matrix. The number of rows and their order has to match the number of spectra in x, respectively their order.

by.x

A character(1) specifying the spectra variable used for merging. Default is "spectrumId".

by.y

A character(1) specifying the column used for merging. Set to by.x if missing.

suffix.y

A character(1) specifying the suffix to be used for making the names of columns in the merged spectra variables unique. This suffix will be used to amend names(y), while spectraVariables(x) will remain unchanged.

drop

For split(): not considered.

Author(s)

Sebastian Gibb, Johannes Rainer, Laurent Gatto

See Also

  • combinePeaks() for functions to aggregate mass peaks data.

  • Spectra for a general description of the Spectra object.

Examples


## Create a Spectra providing a `DataFrame` containing a MS data.

spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))

s <- Spectra(spd)
s

## Create a second Spectra from mzML files and use the `MsBackendMzR`
## on-disk backend.
sciex_file <- dir(system.file("sciex", package = "msdata"),
    full.names = TRUE)
sciex <- Spectra(sciex_file, backend = MsBackendMzR())
sciex

## Subset to the first 100 spectra to reduce running time of the examples
sciex <- sciex[1:100]


##  --------  COMBINE SPECTRA  --------

## Combining the `Spectra` object `s` with the MS data from `sciex`.
## Calling directly `c(s, sciex)` would result in an error because
## both backends use a different backend. We thus have to first change
## the backends to the same backend. We change the backend of the `sciex`
## `Spectra` to a `MsBackendMemory`, the backend used by `s`.

sciex <- setBackend(sciex, MsBackendMemory())

## Combine the two `Spectra`
all <- c(s, sciex)
all

## The new `Spectra` objects contains the union of spectra variables from
## both:
spectraVariables(all)

## The spectra variables that were not present in `s`:
setdiff(spectraVariables(all), spectraVariables(s))

## The values for these were filled with missing values for spectra from
## `s`:
all$peaksCount |> head()


##  --------  AGGREGATE SPECTRA  --------

## Sets of spectra can be combined into a single, representative spectrum
## per set using `combineSpectra()`. This aggregates the peaks data (i.e.
## the spectra's m/z and intensity values) while using the values for all
## spectra variables from the first spectrum per set. Below we define the
## sets as all spectra measured in the *same second*, i.e. rounding their
## retention time to the next closer integer value.
f <- round(rtime(sciex))
head(f)

cmp <- combineSpectra(sciex, f = f)

## The length of `cmp` is now equal to the length of unique levels in `f`:
length(cmp)

## The spectra variable value from the first spectrum per set is used in
## the representative/combined spectrum:
cmp$rtime

## The peaks data was aggregated: the number of mass peaks of the first six
## spectra from the original `Spectra`:
lengths(sciex) |> head()

## and for the first aggreagated spectra:
lengths(cmp) |> head()

## The default peaks data aggregation method joins all mass peaks. See
## documentation of the `combinePeaksData()` function for more options.


##  --------  SPLITTING DATA  --------

## A `Spectra` can be split into a `list` of `Spectra` objects using the
## `split()` function defining the sets into which the `Spectra` should
## be splitted into with parameter `f`.
sciex_split <- split(sciex, f)

length(sciex_split)
sciex_split |> head()


##  --------  ADDING SPECTRA DATA  --------

## Adding new spectra variables
sciex1 <- filterDataOrigin(sciex, dataOrigin(sciex)[1])
spv <- DataFrame(spectrumId = sciex1$spectrumId[3:12], ## used for merging
                 var1 = rnorm(10),
                 var2 = sample(letters, 10))
spv

sciex2 <- joinSpectraData(sciex1, spv, by.y = "spectrumId")

spectraVariables(sciex2)
spectraData(sciex2)[1:13, c("spectrumId", "var1", "var2")]

## Append new spectra variables with cbind2()
df <- data.frame(cola = seq_len(length(sciex1)), colb = "b")
data_append <- cbind2(sciex1, df)

rformassspectrometry/Spectra documentation built on Dec. 19, 2024, 1:05 p.m.