MsBackendPy: A MS data backend for MS data stored in Python

View source: R/MsBackendPython.R

MsBackendPyR Documentation

A MS data backend for MS data stored in Python

Description

The MsBackendPy allows to access MS data stored as matchms.Spectrum or spectrum_utils.spectrum.MsmsSpectrum objects from the matchms respectively spectrum_utils Python library directly from R. The MS data (peaks data or spectra variables) are translated on-the-fly when accessed. Thus, the MsBackendPy allows a seamless integration of Python MS data structures into Spectra::Spectra() based analysis workflows.

The MsBackendPy object is considered read-only, i.e. it does not provide functionality to replace the peaks data from R. However, it is possible to directly change the data in the referenced Python variable.

Usage

## S4 method for signature 'MsBackendPy'
backendInitialize(
  object,
  pythonVariableName = character(),
  spectraVariableMapping = defaultSpectraVariableMapping(),
  pythonLibrary = c("matchms", "spectrum_utils"),
  ...,
  data
)

## S4 method for signature 'MsBackendPy'
length(x)

## S4 method for signature 'MsBackendPy'
spectraVariables(object)

## S4 method for signature 'MsBackendPy'
spectraData(object, columns = spectraVariables(object), drop = FALSE)

## S4 method for signature 'MsBackendPy'
peaksData(object, columns = c("mz", "intensity"), drop = FALSE)

## S4 method for signature 'MsBackendPy'
x$name

## S4 replacement method for signature 'MsBackendPy'
spectraVariableMapping(object) <- value

## S4 replacement method for signature 'Spectra'
spectraVariableMapping(object) <- value

reindex(object)

Arguments

object

A MsBackendPy object.

pythonVariableName

For backendInitialize(): character(1) with the name of the variable/Python attribute that contains the list of matchms.Spectrum objects with the MS data.

spectraVariableMapping

For backendInitialize(): named character with the mapping between spectra variable names and (matchms.Spectrum) metadata names. See defaultSpectraVariableMapping() for more information and details.

pythonLibrary

For backendInitialize(): character(1) specifying the Python library used to represent the MS data in Python. Can be either pythonLibrary = "matchms" (the default) or pythonLibrary = "spectrum_utils".

...

Additional parameters.

data

For backendInitialize(): DataFrame with the full MS data (peaks data and spectra data). Currently not supported.

x

A MsBackendPy object

columns

For spectraData(): character with the names of columns (spectra variables) to retrieve. Defaults to spectraVariables(object). For peaksData(): character with the names of the peaks variables to retrieve.

drop

For spectraData() and peaksData(): logical(1) whether, when a single column is requested, the data should be returned as a vector instead of a data.frame or matrix.

name

For $: character(1) with the name of the variable to retrieve.

value

Replacement value(s).

Details

The MsBackendPy keeps only a reference to the MS data in Python (i.e. the name of the variable in Python) as well as an index pointing to the individual spectra in Python but no other data. Any data requested from the MsBackendPy is accessed and translated on-the-fly from the Python variable. The MsBackendPy is thus an interface to the MS data, but not a data container. All changes to the MS data in the Python variable (performed e.g. in Python) immediately affect any MsBackendPy instances pointing to this variable.

Special care must be taken if the MS data structure in Python is subset or its order is changed (e.g. by another process). In that case it might be needed to re-index the backend using the reindex() function: object <- reindex(object). This will update (replace) the index to the individual spectra in Python which is stored within the backend.

Value

See description of individual functions for their return values.

MsBackendPy methods

The MsBackendPy supports all methods defined by the Spectra::MsBackend() interface for access to MS data. Details on the invidual functions can also be found in the main documentation in the Spectra package (i.e. for Spectra::MsBackend()). Here we provide information for functions with specific properties of the backend.

  • backendInitialize(): initializes the backend with information from the referenced Python variable (attribute). The name of this attribute, ideally stored in the associated Python session, is expected to be provided with the pythonVariableName parameter. The optional spectraVariableMapping parameter allows to provide additional, or alternative, mapping of Spectra's spectra variables to metadata in the matchms.Spectrum objects. See defaultSpectraVariableMapping() (the default) for more information. Parameter pythonLibrary must be used to specify the Python library representing the MS data in Python. It can be either pythonLibrary = "matchms" (the default) or pythonLibrary = "spectrum_utils". The function returns an initialized instance of MsBackendPy.

  • peaksData(): extracts the peaks data matrices from the backend. Python code is applied to the data structure in Python to extract the m/z and intensity values as a list of (numpy) arrays. These are then translated into an R list of two-column numeric matrices. Because Python does not allow to name columns of an array, an additional loop in R is required to set the column names to "mz" and "intensity".

  • spectraData(): extracts the spectra data from the backend. Which spectra variables are translated and retrieved from the Python objects depends on the backend's spectraVariableMapping(). All metadata names defined are retrieved and added to the returned DataFrame (with eventually missing core spectra variables filled with NA).

  • spectraVariables(): retrieves available spectra variables, which include the names of all metadata attributes in the matchms.Spectrum objects and the core spectra variables Spectra::coreSpectraVariables().

  • ⁠spectraVariableMapping<-⁠: replaces the spectraVariableMapping of the backend (see setSpectraVariableMapping() for details and description of the expected format).

Additional helper and utility functions

  • reindex(): update the internal index to match 1:length(object). This function is useful if the original data referenced by the backend was subset or re-ordered by a different process (or a function in Python).

Note

As mentioned in the details section the MS data is completely stored in Python and the backend only references to this data through the name of the variable in Python. Thus, each time MS data is requested from the backend, it is retrieved in its current state. If for example data was transformed or metadata added or removed in the Python object, it immediately affects the Spectra/backend.

Author(s)

Johannes Rainer and the EuBIC hackathon team

Examples


## Loading an example MGF file provided by the SpectriPy package.
## As an alternative, the data could also be imported directly in Python
## using:
## import matchms
## from matchms.importing import load_from_mgf
## s_p = list(load_from_mgf(r.fl))
library(Spectra)
library(MsBackendMgf)

fl <- system.file("extdata", "mgf", "test.mgf", package = "SpectriPy")
s <- Spectra(fl, source = MsBackendMgf())
s

## Translating the MS data to Python and assigning it to a variable
## named "s_p" in the (*reticulate*'s) `py` Python environment. Assigning
## the variable to the Python environment has performance advantages, as
## any Python code applied to the MS data does not require any data
## conversions.
py_set_attr(py, "s_p", rspec_to_pyspec(s))


## Create a `MsBackendPy` representing an interface to the data in the
## "s_p" variable in Python:
be <- backendInitialize(MsBackendPy(), "s_p")
be

## Create a Spectra object which this backend:
s_2 <- Spectra(be)
s_2

## Available spectra variables: these include, next to the *core* spectra
## variables, also the names of all metadata stored in the `matchms.Spectrum`
## objects.
spectraVariables(s_2)

## Get the full peaks data:
peaksData(s_2)

## Get the peaks from the first spectrum
peaksData(s_2)[[1L]]

## Get the full spectra data:
spectraData(s_2)

## Get the m/z values
mz(s_2)

## Plot the first spectrum
plotSpectra(s_2[1L])


########
## Using the spectrum_utils Python library

## Below we convert the data to a list of `MsmsSpectrum` object from the
## spectrum_utils library.
py_set_attr(py, "su_p", rspec_to_pyspec(s,
    spectraVariableMapping("spectrum_utils"), "spectrum_utils"))

## Create a MsBackendPy representing this data. Importantly, we need to
## specify the Python library using the `pythonLibrary` parameter and
## ideally also set the `spectraVariableMapping` to the one specific for
## that library.
be <- backendInitialize(MsBackendPy(), "su_p",
    spectraVariableMapping = spectraVariableMapping("spectrum_utils"),
    pythonLibrary = "spectrum_utils")
be

## Get the peaks data for the first 3 spectra
peaksData(be[1:3])

## Get the full spectraData
spectraData(be)

## Extract the precursor m/z
be$precursorMz

rformassspectrometry/SpectriPy documentation built on June 11, 2025, 12:49 a.m.