BiocStyle::markdown()
Compiled: r date()
The SpectriPy package allows integration of Python MS packages into a
Spectra-based MS analysis in R. Python functionality is wrapped into R
functions allowing a seamless integration of the functionality of Python's
matchms package into R. In addition, functions to convert between R's
Spectra
objects and Python's matchms spectrum objects are available to the
advanced user or developer enabling to create custom functions or workflows on
Spectra
objects in Python and executing them in R using the reticulate R package.
The package requires a python environment to be available and can be installed
with the BiocManager R package using the command
BiocManager::install("RforMassSpectrometry/SpectriPy")
. All required python
libraries are installed automatically on demand. Note that first installation
or first invocation of specific functions might take long because of this
installation process.
matchms
The SpectriPy package provides the compareSpectriPy()
function that allows
to perform spectra similarity calculations using the scoring functions from the
matchms Python package. Below all currently supported scoring functions are
listed along with the parameter class that allows selecting and configuring
the algorithm in the compareSpectriPy()
function. Additional functions will be
added in future.
CosineGreedyParam
.CosineHungarianParam
.ModifiedCosineParam
.We next create some simple example spectra and subsequently use the
compareSpectriPy()
function to calculate pairwise similarities between these.
library(Spectra) library(SpectriPy) ## Create a Spectra object with two MS2 spectra for Caffeine. caf <- DataFrame( msLevel = c(2L, 2L), name = "Caffeine", precursorMz = c(195.0877, 195.0877) ) caf$intensity <- list( c(340.0, 416, 2580, 412), c(388.0, 3270, 85, 54, 10111)) caf$mz <- list( c(135.0432, 138.0632, 163.0375, 195.0880), c(110.0710, 138.0655, 138.1057, 138.1742, 195.0864)) caf <- Spectra(caf) ## Create a Spectra object with two MS2 spectra for 1-Methylhistidine mhd <- DataFrame( msLevel = c(2L, 2L), precursorMz = c(170.0924, 170.0924), id = c("HMDB0000001", "HMDB0000001"), name = c("1-Methylhistidine", "1-Methylhistidine")) mhd$mz <- list( c(109.2, 124.2, 124.5, 170.16, 170.52), c(83.1, 96.12, 97.14, 109.14, 124.08, 125.1, 170.16)) mhd$intensity <- list( c(3.407, 47.494, 3.094, 100.0, 13.240), c(6.685, 4.381, 3.022, 16.708, 100.0, 4.565, 40.643)) mhd <- Spectra(mhd)
We first calculate pairwise similarities between all spectra defined above and
those of caffeine using Spectra's built-in compareSpectra()
function.
all <- c(caf, mhd) res_r <- compareSpectra(all, caf) res_r
Thus, compareSpectra()
returned the pairwise similarity scores (by default
calculated using the normalized dot-product function) between all spectra in
all
(rows) and all spectra in caf
(columns). compareSpectriPy()
works
similar, with the difference that we need to specify and configure the
similarity function (from matchms) using a dedicated parameter object. Below
we calculate the similarity using the CosineGreedy function changing the
tolerance
to a value of 0.05
(instead of the default 0.1
).
res <- compareSpectriPy(all, caf, param = CosineGreedyParam(tolerance = 0.05)) res
As a result compareSpectriPy
returns also a numeric matrix of similarities.
Note also that the first compareSpectriPy
call takes usually a little longer
because the Python setup has to be initialized.
Next we use the ModifiedCosine algorithm that considers also differences between the spectra's precursor m/z in the calculation.
res <- compareSpectriPy(all, caf, param = ModifiedCosineParam()) res
Note that for this calculation all spectra precursor m/z values need to be
available, otherwise an error will be thrown. Thus, we should always ensure to
remove spectra without precursor m/z values prior to similarity scoring with
this similarity method. Below we remove the precursor m/z from one of our input
spectra and then show how the Spectra
object could be subsetted to valid
spectra for this method.
## Remove precursor m/z from the 3rd spectrum all$precursorMz[3] <- NA ## Filter the input spectra removing those with missing precursor. all <- all[!is.na(precursorMz(all))] compareSpectriPy(all, caf, param = ModifiedCosineParam())
For advanced users or developers, the rspec_to_pyspec
and pyspec_to_rspec
functions are available that enable conversion between MS data representations
in R
and Python (i.e. between the Spectra
object and the Python matchms
Spectrum object). To use these functions the reticulate
package needs to be
installed along with a Python environment and the matchms
Python package.
To illustrate their use we initialize below the Python environment that is
bundled using the r BiocStyle::Biocpkg("basilisk")
package within SpectriPy
.
library(SpectriPy) library(basilisk) cl <- basiliskStart(SpectriPy:::matchms_env)
We next create a simple Spectra
object representing fragment spectra for some
small compounds.
library(Spectra) # create example spectra spd <- DataFrame( msLevel = c(2L, 2L, 2L), id = c("HMDB0000001", "HMDB0000001", "HMDB0001847"), name = c("1-Methylhistidine", "1-Methylhistidine", "Caffeine")) ## Assign m/z and intensity values. spd$mz <- list( c(109.2, 124.2, 124.5, 170.16, 170.52), c(83.1, 96.12, 97.14, 109.14, 124.08, 125.1, 170.16), c(56.0494, 69.0447, 83.0603, 109.0395, 110.0712, 111.0551, 123.0429, 138.0662, 195.0876)) spd$intensity <- list( c(3.407, 47.494, 3.094, 100.0, 13.240), c(6.685, 4.381, 3.022, 16.708, 100.0, 4.565, 40.643), c(0.459, 2.585, 2.446, 0.508, 8.968, 0.524, 0.974, 100.0, 40.994)) sps <- Spectra(spd)
We next convert the Spectra
to a matchms
Spectrum object.
pysps <- rspec_to_pyspec(sps) pysps
As a result we got now a Python list with 3 Spectrum objects containing the peak
data of sps
as well as a reduced set of the available spectra variables in
sps
. Which spectra variables get copied to Python can be defined with the
mapping
parameter of rspec_to_pyspec
. The default is to convert all
variables defined by spectraVariableMapping()
, but additional variables along
with their respective names in the Python Spectrum object can be defined
too. Below we list the pre-selected spectra variables that are converted by
default.
spectraVariableMapping()
With our Spectra
data converted to Python, we could directly call all routines
from the matchms
package using the reticulate
R package. Below we normalize
the intensities of the 3 spectra using the from normalize_intensities
functions from the matchms.filtering
Python package. We thus need to import
first the functionality from that package and can then directly call the
function on the objects.
library(reticulate) filters <- import("matchms.filtering") res <- vector("list", length(pysps)) for (i in (seq_along(pysps) - 1)) res[[i + 1]] <- filters$normalize_intensities(pysps[i]) res <- r_to_py(res) res
We can now convert the list of Python Spectrum objects back to R with
pyspec_to_rspec
:
sps_r <- pyspec_to_rspec(res) #' The normalized intensities intensity(sps_r) #' The original intensities intensity(sps)
Intensity values have now been normalized to values between 0 and 1.
Note however that, it would be much better (and likely more efficient) to directly use a Python function on the list of Spectrum objects instead of the mixed R and Python code used in the example above. Below we define a simple python script that iterates over the spectra in python and performs the normalization.
py_script <- paste0("from matchms.filtering import normalize_intensities\n", "for i in range(len(pysps)):\n", " pysps[i] = normalize_intensities(pysps[i])\n") py$pysps <- pysps py_run_string(py_script) tmp <- pyspec_to_rspec(pysps) intensity(tmp)
At very last we need also to stop the Python environment created by basilisk
.
basiliskStop(cl)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.