spectraData: Accessing mass spectrometry data
In rformassspectrometry/Spectra: Spectra Infrastructure for Mass Spectrometry Data

spectraData

R Documentation

Accessing mass spectrometry data

Description

As detailed in the documentation of the Spectra class, a Spectra object is a container for mass spectrometry (MS) data that includes both the mass peaks data (or peaks data, generally m/z and intensity values) as well as spectra metadata (so called spectra variables). Spectra variables generally define one value per spectrum, while for peaks variables one value per mass peak is defined and hence multiple values per spectrum (depending on the number of mass peaks of a spectrum).

Data can be extracted from a Spectra object using dedicated accessor functions or also using the $ operator. Depending on the backend class used by the Spectra to represent the data, data can also be added or replaced (again, using dedicated functions or using ⁠$<-⁠).

Usage

asDataFrame(
  object,
  i = seq_along(object),
  spectraVars = spectraVariables(object)
)

## S4 method for signature 'Spectra'
acquisitionNum(object)

## S4 method for signature 'Spectra'
centroided(object)

## S4 replacement method for signature 'Spectra'
centroided(object) <- value

## S4 method for signature 'Spectra'
collisionEnergy(object)

## S4 replacement method for signature 'Spectra'
collisionEnergy(object) <- value

coreSpectraVariables()

## S4 method for signature 'Spectra'
dataOrigin(object)

## S4 replacement method for signature 'Spectra'
dataOrigin(object) <- value

## S4 method for signature 'Spectra'
dataStorage(object)

## S4 method for signature 'Spectra'
intensity(object, f = processingChunkFactor(object), ...)

## S4 method for signature 'Spectra'
ionCount(object)

## S4 method for signature 'Spectra'
isCentroided(object, ...)

## S4 method for signature 'Spectra'
isEmpty(x)

## S4 method for signature 'Spectra'
isolationWindowLowerMz(object)

## S4 replacement method for signature 'Spectra'
isolationWindowLowerMz(object) <- value

## S4 method for signature 'Spectra'
isolationWindowTargetMz(object)

## S4 replacement method for signature 'Spectra'
isolationWindowTargetMz(object) <- value

## S4 method for signature 'Spectra'
isolationWindowUpperMz(object)

## S4 replacement method for signature 'Spectra'
isolationWindowUpperMz(object) <- value

## S4 method for signature 'Spectra'
length(x)

## S4 method for signature 'Spectra'
lengths(x, use.names = FALSE)

## S4 method for signature 'Spectra'
msLevel(object)

## S4 method for signature 'Spectra'
mz(object, f = processingChunkFactor(object), ...)

## S4 method for signature 'Spectra'
peaksData(
  object,
  columns = c("mz", "intensity"),
  f = processingChunkFactor(object),
  return.type = c("SimpleList", "list"),
  ...,
  BPPARAM = bpparam()
)

## S4 method for signature 'Spectra'
peaksVariables(object)

## S4 method for signature 'Spectra'
polarity(object)

## S4 replacement method for signature 'Spectra'
polarity(object) <- value

## S4 method for signature 'Spectra'
precScanNum(object)

## S4 method for signature 'Spectra'
precursorCharge(object)

## S4 method for signature 'Spectra'
precursorIntensity(object)

## S4 method for signature 'Spectra'
precursorMz(object)

## S4 replacement method for signature 'Spectra'
precursorMz(object, ...) <- value

## S4 method for signature 'Spectra'
rtime(object)

## S4 replacement method for signature 'Spectra'
rtime(object) <- value

## S4 method for signature 'Spectra'
scanIndex(object)

## S4 method for signature 'Spectra'
smoothed(object)

## S4 replacement method for signature 'Spectra'
smoothed(object) <- value

## S4 method for signature 'Spectra'
spectraData(object, columns = spectraVariables(object))

## S4 replacement method for signature 'Spectra'
spectraData(object) <- value

## S4 method for signature 'Spectra'
spectraNames(object)

## S4 replacement method for signature 'Spectra'
spectraNames(object) <- value

## S4 method for signature 'Spectra'
spectraVariables(object)

## S4 method for signature 'Spectra'
tic(object, initial = TRUE)

## S4 method for signature 'Spectra'
uniqueMsLevels(object, ...)

## S4 method for signature 'Spectra'
x$name

## S4 replacement method for signature 'Spectra'
x$name <- value

## S4 method for signature 'Spectra'
x[[i, j, ...]]

## S4 replacement method for signature 'Spectra'
x[[i, j, ...]] <- value

Arguments

`object`	A `Spectra` object.
`i`	For `asDataFrame()`: A `numeric` indicating which scans to coerce to a `DataFrame` (default is `seq_along(object)`).
`spectraVars`	`character()` indicating what spectra variables to add to the `DataFrame`. Default is `spectraVariables(object)`, i.e. all available variables.
`value`	A vector with values to replace the respective spectra variable. Needs to be of the correct data type for the spectra variable.
`f`	For `intensity()`, `mz()` and `peaksData()`: factor defining how data should be chunk-wise loaded an processed. Defaults to `processingChunkFactor()`.
`...`	Additional arguments.
`x`	A `Spectra` object.
`use.names`	For `lengths()`: ignored.
`columns`	For `spectraData()` accessor: optional `character` with column names (spectra variables) that should be included in the returned `DataFrame`. By default, all columns are returned. For `peaksData()` accessor: optional `character` with requested columns in the individual `matrix` of the returned `list`. Defaults to `c("mz", "value")` but any values returned by `peaksVariables(object)` with `object` being the `Spectra` object are supported.
`return.type`	For `peaksData()`: `character(1)` allowing to specify if the results should be returned as a `SimpleList` or as a `list`. Defaults to `return.type = "SimpleList"`.
`BPPARAM`	Parallel setup configuration. See `BiocParallel::bpparam()` for more information. See also `processingChunkSize()` for more information on parallel processing.
`initial`	For `tic()`: `logical(1)` whether the initially reported total ion current should be reported, or whether the total ion current should be (re)calculated on the actual data (`initial = FALSE`, same as `ionCount()`).
`name`	For `$` and `⁠$<-⁠`: the name of the spectra variable to return or set.
`j`	For `[`: not supported.

Spectra variables

A common set of core spectra variables are defined for Spectra. These have a pre-defined data type and each Spectra will return a value for these if requested. If no value for a spectra variable is defined, a missing value (of the correct data type) is returned. The list of core spectra variables and their respective data type is:

acquisitionNum integer(1): the index of acquisition of a spectrum during an MS run.
centroided logical(1): whether the spectrum is in profile or centroid mode.
collisionEnergy numeric(1): collision energy used to create an MSn spectrum.
dataOrigin character(1): the origin of the spectrum's data, e.g. the mzML file from which it was read.
dataStorage character(1): the (current) storage location of the spectrum data. This value depends on the backend used to handle and provide the data. For an in-memory backend like the MsBackendDataFrame this will be "<memory>", for an on-disk backend such as the MsBackendHdf5Peaks it will be the name of the HDF5 file where the spectrum's peak data is stored.
isolationWindowLowerMz numeric(1): lower m/z for the isolation window in which the (MSn) spectrum was measured.
isolationWindowTargetMz numeric(1): the target m/z for the isolation window in which the (MSn) spectrum was measured.
isolationWindowUpperMz numeric(1): upper m/z for the isolation window in which the (MSn) spectrum was measured.
msLevel integer(1): the MS level of the spectrum.
polarity integer(1): the polarity of the spectrum (0 and 1 representing negative and positive polarity, respectively).
precScanNum integer(1): the scan (acquisition) number of the precursor for an MSn spectrum.
precursorCharge integer(1): the charge of the precursor of an MSn spectrum.
precursorIntensity numeric(1): the intensity of the precursor of an MSn spectrum.
precursorMz numeric(1): the m/z of the precursor of an MSn spectrum.
rtime numeric(1): the retention time of a spectrum.
scanIndex integer(1): the index of a spectrum within a (raw) file.
smoothed logical(1): whether the spectrum was smoothed.

For each of these spectra variable a dedicated accessor function is defined (such as msLevel() or rtime()) that allows to extract the values of that spectra variable for all spectra in a Spectra object. Also, replacement functions are defined, but not all backends might support replacing values for spectra variables. As described above, additional spectra variables can be defined or added. The spectraVariables() function can be used to

Values for multiple spectra variables, or all spectra vartiables* can be extracted with the spectraData() function.

Peaks variables

Spectra also provide mass peak data with the m/z and intensity values being the core peaks variables:

intensity numeric: intensity values for the spectrum's peaks.
mz numeric: the m/z values for the spectrum's peaks.

Values for these can be extracted with the mz() and intensity() functions, or the peaksData() function. The former functions return a NumericList with the respective values, while the latter returns a List with numeric two-column matrices. The list of peaks matrices can also be extracted using as(x, "list") or as(x, "SimpleList") with x being a Spectra object.

Some Spectra/backends provide also values for additional peaks variables. The set of available peaks variables can be extracted with the peaksVariables() function.

Functions to access MS data

The set of available functions to extract data from, or set data in, a Spectra object are (in alphabetical order) listed below. Note that there are also other functions to extract information from a Spectra object documented in addProcessing().

$, ⁠$<-⁠: gets (or sets) a spectra variable for all spectra in object. See examples for details. Note that replacing values of a peaks variable is not supported with a non-empty processing queue, i.e. if any filtering or data manipulations on the peaks data was performed. In these cases applyProcessing() needs to be called first to apply all cached data operations.
[[, ⁠[[<-⁠: access or set/add a single spectrum variable (column) in the backend.
acquisitionNum(): returns the acquisition number of each spectrum. Returns an integer of length equal to the number of spectra (with NA_integer_ if not available).
asDataFrame(): converts the Spectra to a DataFrame (in long format) contining all data. Returns a DataFrame.
centroided(), ⁠centroided<-⁠: gets or sets the centroiding information of the spectra. centroided() returns a logical vector of length equal to the number of spectra with TRUE if a spectrum is centroided, FALSE if it is in profile mode and NA if it is undefined. See also isCentroided() for estimating from the spectrum data whether the spectrum is centroided. value for ⁠centroided<-⁠ is either a single logical or a logical of length equal to the number of spectra in object.
collisionEnergy(), ⁠collisionEnergy<-⁠: gets or sets the collision energy for all spectra in object. collisionEnergy() returns a numeric with length equal to the number of spectra (NA_real_ if not present/defined), ⁠collisionEnergy<-⁠ takes a numeric of length equal to the number of spectra in object.
coreSpectraVariables(): returns the core spectra variables along with their expected data type.
dataOrigin(), ⁠dataOrigin<-⁠: gets or sets the data origin for each spectrum. dataOrigin() returns a character vector (same length than object) with the origin of the spectra. ⁠dataOrigin<-⁠ expects a character vector (same length than object) with the replacement values for the data origin of each spectrum.
dataStorage(): returns a character vector (same length than object) with the data storage location of each spectrum.
intensity(): gets the intensity values from the spectra. Returns a IRanges::NumericList() of numeric vectors (intensity values for each spectrum). The length of the list is equal to the number of spectra in object.
ionCount(): returns a numeric with the sum of intensities for each spectrum. If the spectrum is empty (see isEmpty()), NA_real_ is returned.
isCentroided(): a heuristic approach assessing if the spectra in object are in profile or centroided mode. The function takes the qtlth quantile top peaks, then calculates the difference between adjacent m/z value and returns TRUE if the first quartile is greater than k. (See Spectra:::.isCentroided() for the code.)
isEmpty(): checks whether a spectrum in object is empty (i.e. does not contain any peaks). Returns a logical vector of length equal number of spectra.
isolationWindowLowerMz(), ⁠isolationWindowLowerMz<-⁠: gets or sets the lower m/z boundary of the isolation window.
isolationWindowTargetMz(), ⁠isolationWindowTargetMz<-⁠: gets or sets the target m/z of the isolation window.
isolationWindowUpperMz(), ⁠isolationWindowUpperMz<-⁠: gets or sets the upper m/z boundary of the isolation window.
length(): gets the number of spectra in the object.
lengths(): gets the number of peaks (m/z-intensity values) per spectrum. Returns an integer vector (length equal to the number of spectra). For empty spectra, 0 is returned.
msLevel(): gets the spectra's MS level. Returns an integer vector (names being spectrum names, length equal to the number of spectra) with the MS level for each spectrum.
mz(): gets the mass-to-charge ratios (m/z) from the spectra. Returns a IRanges::NumericList() or length equal to the number of spectra, each element a numeric vector with the m/z values of one spectrum.
peaksData(): gets the peaks data for all spectra in object. Peaks data consist of the m/z and intensity values as well as possible additional annotations (variables) of all peaks of each spectrum. The function returns a S4Vectors::SimpleList() of two dimensional arrays (either matrix or data.frame), with each array providing the values for the requested peak variables (by default "mz" and "intensity"). Optional parameter columns is passed to the backend's peaksData() function to allow the selection of specific (or additional) peaks variables (columns) that should be extracted (if available). Importantly, it is not guaranteed that each backend supports this parameter (while each backend must support extraction of "mz" and "intensity" columns). Parameter columns defaults to c("mz", "intensity") but any value returned by peaksVariables(object) is supported. Note also that it is possible to extract the peak data with as(x, "list") and as(x, "SimpleList") as a list and SimpleList, respectively. Note however that, in contrast to peaksData(), as() does not support the parameter columns.
peaksVariables(): lists the available variables for mass peaks provided by the backend. Default peak variables are "mz" and "intensity" (which all backends need to support and provide), but some backends might provide additional variables. These variables correspond to the column names of the peak data array returned by peaksData().
polarity(), ⁠polarity<-⁠: gets or sets the polarity for each spectrum. polarity() returns an integer vector (length equal to the number of spectra), with 0 and 1 representing negative and positive polarities, respectively. ⁠polarity<-⁠ expects an integer vector of length 1 or equal to the number of spectra.
precursorCharge(), precursorIntensity(), precursorMz(), precScanNum(), precAcquisitionNum(): gets the charge (integer), intensity (numeric), m/z (numeric), scan index (integer) and acquisition number (interger) of the precursor for MS level > 2 spectra from the object. Returns a vector of length equal to the number of spectra in object. NA are reported for MS1 spectra of if no precursor information is available.
rtime(), ⁠rtime<-⁠: gets or sets the retention times (in seconds) for each spectrum. rtime() returns a numeric vector (length equal to the number of spectra) with the retention time for each spectrum. ⁠rtime<-⁠ expects a numeric vector with length equal to the number of spectra.
scanIndex(): returns an integer vector with the scan index for each spectrum. This represents the relative index of the spectrum within each file. Note that this can be different to the acquisitionNum of the spectrum which represents the index of the spectrum during acquisition/measurement (as reported in the mzML file).
smoothed(),⁠smoothed<-⁠: gets or sets whether a spectrum is smoothed. smoothed() returns a logical vector of length equal to the number of spectra. ⁠smoothed<-⁠ takes a logical vector of length 1 or equal to the number of spectra in object.
spectraData(): gets general spectrum metadata (annotation, also called header). spectraData() returns a DataFrame. Note that this method does by default not return m/z or intensity values.
⁠spectraData<-⁠: replaces the full spectra data of the Spectra object with the one provided with value. The ⁠spectraData<-⁠ function expects a DataFrame to be passed as value with the same number of rows as there a spectra in object. Note that replacing values of peaks variables is not supported with a non-empty processing queue, i.e. if any filtering or data manipulations on the peaks data was performed. In these cases applyProcessing() needs to be called first to apply all cached data operations and empty the processing queue.
spectraNames(), ⁠spectraNames<-⁠: gets or sets the spectra names.
spectraVariables(): returns a character vector with the available spectra variables (columns, fields or attributes of each spectrum) available in object. Note that spectraVariables() does not list the peak variables ("mz", "intensity" and eventual additional annotations for each MS peak). Peak variables are returned by peaksVariables().
tic(): gets the total ion current/count (sum of signal of a spectrum) for all spectra in object. By default, the value reported in the original raw data file is returned. For an empty spectrum, 0 is returned.
uniqueMsLevels(): get the unique MS levels available in object. This function is supposed to be more efficient than unique(msLevel(object)).

Author(s)

Sebastian Gibb, Johannes Rainer, Laurent Gatto, Philippine Louail

Examples


## Create a Spectra from mzML files and use the `MsBackendMzR` on-disk
## backend.
sciex_file <- dir(system.file("sciex", package = "msdata"),
    full.names = TRUE)
sciex <- Spectra(sciex_file, backend = MsBackendMzR())
sciex

## Get the number of spectra in the data set
length(sciex)

## Get the number of mass peaks per spectrum - limit to the first 6
lengths(sciex) |> head()

## Get the MS level for each spectrum - limit to the first 6 spectra
msLevel(sciex) |> head()

## Alternatively, we could also use $ to access a specific spectra variable.
## This could also be used to add additional spectra variables to the
## object (see further below).
sciex$msLevel |> head()

## Get the intensity and m/z values.
intensity(sciex)
mz(sciex)

## Convert a subset of the Spectra object to a long DataFrame.
asDataFrame(sciex, i = 1:3, spectraVars = c("rtime", "msLevel"))

## Create a Spectra providing a `DataFrame` containing the spectrum data.

spd <- DataFrame(msLevel = c(1L, 2L), rtime = c(1.1, 1.2))
spd$mz <- list(c(100, 103.2, 104.3, 106.5), c(45.6, 120.4, 190.2))
spd$intensity <- list(c(200, 400, 34.2, 17), c(12.3, 15.2, 6.8))

s <- Spectra(spd)
s

## List all available spectra variables (i.e. spectrum data and metadata).
spectraVariables(s)

## For all *core* spectrum variables accessor functions are available. These
## return NA if the variable was not set.
centroided(s)
dataStorage(s)
rtime(s)
precursorMz(s)

## The core spectra variables are:
coreSpectraVariables()

## Add an additional metadata column.
s$spectrum_id <- c("sp_1", "sp_2")

## List spectra variables, "spectrum_id" is now also listed
spectraVariables(s)

## Get the values for the new spectra variable
s$spectrum_id

## Extract specific spectra variables.
spectraData(s, columns = c("spectrum_id", "msLevel"))


##  --------  PEAKS VARIABLES AND DATA  --------

## Get the peak data (m/z and intensity values).
pks <- peaksData(s)
pks
pks[[1]]
pks[[2]]

## Note that we could get the same resulb by coercing the `Spectra` to
## a `list` or `SimpleList`:
as(s, "list")
as(s, "SimpleList")

## Or use `mz()` and `intensity()` to extract the m/z and intensity values
## separately
mz(s)
intensity(s)

## Some `MsBackend` classes provide support for arbitrary peaks variables
## (in addition to the mandatory `"mz"` and `"intensity"` values. Below
## we create a simple data frame with an additional peak variable `"pk_ann"`
## and create a `Spectra` with a `MsBackendMemory` for that data.
## Importantly the number of values (per spectrum) need to be the same
## for all peak variables.

tmp <- data.frame(msLevel = c(2L, 2L), rtime = c(123.2, 123.5))
tmp$mz <- list(c(103.1, 110.4, 303.1), c(343.2, 453.1))
tmp$intensity <- list(c(130.1, 543.1, 40), c(0.9, 0.45))
tmp$pk_ann <- list(c(NA_character_, "A", "P"), c("B", "P"))

## Create the Spectra. With parameter `peaksVariables` we can define
## the columns in `tmp` that contain peaks variables.
sps <- Spectra(tmp, source = MsBackendMemory(),
    peaksVariables = c("mz", "intensity", "pk_ann"))
peaksVariables(sps)

## Extract just the m/z and intensity values
peaksData(sps)[[1L]]

## Extract the full peaks data
peaksData(sps, columns = peaksVariables(sps))[[1L]]

## Access just the pk_ann variable
sps$pk_ann

rformassspectrometry/Spectra documentation built on April 13, 2025, 5:54 p.m.