BiocStyle::markdown()
Compiled: r date()
Powerful software libraries for mass spectrometry (MS) data are available in both Python and R, covering specific and sometimes distinct aspects in the analysis of proteomics and metabolomics data. R's reticulate package converts basic data types between Python and R and enables a seamless interoperability of both programming languages. The SpectriPy package extends reticulate allowing to translate MS data data structures between Python and R. In addition, functionality from Python's matchms library is directly wrapped into R functions allowing a seamless integration into R based workflows. SpectriPy thus enables powerful proteomics or metabolomics analysis workflows combining the strengths of Python and R MS libraries.
If the BiocManager package is not already available, please install it with
install.packages("BiocManager")
. To install the development version of
SpectriPy from GitHub, please install in addition the remotes package with
`install.packages("remotes"). As a system dependency, the package requires a
Python environment to be available.
LLLL Update installation instructions.
and can be installed
with the BiocManager R package using the command
BiocManager::install("RforMassSpectrometry/SpectriPy")
. This will install the
latest version of the package from GitHub.
All required python libraries are installed automatically on demand.
Note that first installation or first invocation of specific functions might
take long because of this installation process.
The reticulate package enables a
seamless integration of Python with R by translating the core data structures
between the two programming languages and sharing a Python (respectively R)
runtime environment that can be accessed from the other programming languages
including shared variables. The SpectriPy package builds on that providing
functionality to translate data structures for mass spectrometry (MS) data
between the two languages. Specifically, the package translates between R's
Spectra
objects (from the r BiocStyle::Biocpkg("Spectra")
package) and
matchms.Spectrum
objects from the
matchms Python library.
In the first examples we translated MS data from R to Python. We first load data
from a mzML data file provided through the r BiocStyle::Biocpkg("msdata")
package.
#' Loading the data from a mzML file as a `Spectra` object. library(Spectra) fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata") s_r <- Spectra(fl)
We next restrict the data to MS2 spectra and remove spectra with less than 3 peaks (fragments).
#' Restrict to MS level 2 spectra s_r <- filterMsLevel(s_r, 2) s_r <- s_r[lengths(s_r) < 3] s_r
library(AnnotationHub) ah <- AnnotationHub(localHub = TRUE) query(ah, "MassBank") mb <- ah[["AH107049"]] mb <- Spectra(mb)
register(SerialParam()) library(MetaboAnnotation) m <- matchSpectra(s_r, mb, CompareSpectraParam(ppm = 40)) set.seed(123) idx <- c(whichTarget(m), 5970, 5971, 5972, 5973, 10279, 15505, 15506, sample(seq_along(mb), 51)) mb <- mb[sort(idx)] tmp <- selectSpectraVariables( mb, c("msLevel", "rtime", "acquisitionNum", "precursorMz", "precursorIntensity", "precursorCharge", "inchi", "smiles", "name")) library(MsBackendMgf) sm <- c(spectraVariableMapping(MsBackendMgf()), inchi = "INCHI", smiles = "SMILES", name = "NAME") export(tmp, MsBackendMgf(), file = "/home/jo/tmp/test.mgf", mapping = sm) library(AnnotationHub) tmp <- system.file()
We next import MS data from a file in MGF format using the matchms Python library.
matchms
The SpectriPy package provides the compareSpectriPy()
function that allows
to perform spectra similarity calculations using the scoring functions from the
matchms Python package. Below all currently supported scoring functions are
listed along with the parameter class that allows selecting and configuring
the algorithm in the compareSpectriPy()
function. Additional functions will be
added in future.
CosineGreedyParam
.CosineHungarianParam
.ModifiedCosineParam
.We next create some simple example spectra and subsequently use the
compareSpectriPy()
function to calculate pairwise similarities between these.
library(Spectra) library(SpectriPy) ## Create a Spectra object with two MS2 spectra for Caffeine. caf <- DataFrame( msLevel = c(2L, 2L), name = "Caffeine", precursorMz = c(195.0877, 195.0877) ) caf$intensity <- list( c(340.0, 416, 2580, 412), c(388.0, 3270, 85, 54, 10111)) caf$mz <- list( c(135.0432, 138.0632, 163.0375, 195.0880), c(110.0710, 138.0655, 138.1057, 138.1742, 195.0864)) caf <- Spectra(caf) ## Create a Spectra object with two MS2 spectra for 1-Methylhistidine mhd <- DataFrame( msLevel = c(2L, 2L), precursorMz = c(170.0924, 170.0924), id = c("HMDB0000001", "HMDB0000001"), name = c("1-Methylhistidine", "1-Methylhistidine")) mhd$mz <- list( c(109.2, 124.2, 124.5, 170.16, 170.52), c(83.1, 96.12, 97.14, 109.14, 124.08, 125.1, 170.16)) mhd$intensity <- list( c(3.407, 47.494, 3.094, 100.0, 13.240), c(6.685, 4.381, 3.022, 16.708, 100.0, 4.565, 40.643)) mhd <- Spectra(mhd)
We first calculate pairwise similarities between all spectra defined above and
those of caffeine using Spectra's built-in compareSpectra()
function.
all <- c(caf, mhd) res_r <- compareSpectra(all, caf) res_r
Thus, compareSpectra()
returned the pairwise similarity scores (by default
calculated using the normalized dot-product function) between all spectra in
all
(rows) and all spectra in caf
(columns). compareSpectriPy()
works
similar, with the difference that we need to specify and configure the
similarity function (from matchms) using a dedicated parameter object. Below
we calculate the similarity using the CosineGreedy function changing the
tolerance
to a value of 0.05
(instead of the default 0.1
).
Executing the following step will automatically download and install conda mini forge to manage Python package dependencies.
res <- compareSpectriPy(all, caf, param = CosineGreedyParam(tolerance = 0.05)) res
As a result compareSpectriPy()
returns also a numeric matrix of similarities.
Note also that the first compareSpectriPy()
call takes usually a little longer
because the Python setup has to be initialized.
Next we use the ModifiedCosine algorithm that considers also differences between the spectra's precursor m/z in the calculation.
res <- compareSpectriPy(all, caf, param = ModifiedCosineParam()) res
A third example is the CosineHungarian algorithm, also a similarity function
from matchms. This algorithm calculates the cosine similarity score as with
CosineGreedyParam
, but using the Hungarian algorithm to find the best matching
peaks between the compared spectra. The algorithm can be configured with
parameters tolerance
, mzPower
and intensityPower
(see parameter description for more details).
res <- compareSpectriPy(all, caf, param = CosineHungarianParam()) res
A fourth example is the NeutralLossesCosine algorithm, also a similarity
function from matchms. The neutral losses cosine score aims at quantifying the
similarity between two mass spectra. The score is calculated by finding best
possible matches between peaks of two spectra. Two peaks are considered a
potential match if their m/z ratios lie within the given tolerance
once a
mass-shift is applied. The mass shift is the difference in precursor-m/z
between the two spectra.
res <- compareSpectriPy(all, caf, param = NeutralLossesCosineParam()) res
Note that for the abovementioned similarity calculations all spectra precursor
m/z values need to be available, otherwise an error will be thrown. Thus, we
should always ensure to remove spectra without precursor m/z values prior to
similarity scoring with this similarity method. Below we remove the precursor
m/z from one of our input spectra and then show how the Spectra
object could
be subsetted to valid spectra for this method.
## Remove precursor m/z from the 3rd spectrum all$precursorMz[3] <- NA ## Filter the input spectra removing those with missing precursor. all <- all[!is.na(precursorMz(all))] compareSpectriPy(all, caf, param = ModifiedCosineParam())
For advanced users or developers, the rspec_to_pyspec()
and
pyspec_to_rspec()
functions are available that enable conversion between MS
data representations in R and Python (i.e. between the Spectra::Spectra
object
and the Python matchms.Spectrum
object). To use these functions the
reticulate R package needs to be installed along with a Python environment and
the matchms Python package.
To illustrate their use we initialize below the Python environment that is
bundled using the r BiocStyle::Biocpkg("basilisk")
package within SpectriPy.
library(SpectriPy) library(basilisk) cl <- basiliskStart(SpectriPy:::matchms_env)
We next create a simple Spectra
object representing fragment spectra for some
small compounds.
library(Spectra) # create example spectra spd <- DataFrame( msLevel = c(2L, 2L, 2L), id = c("HMDB0000001", "HMDB0000001", "HMDB0001847"), name = c("1-Methylhistidine", "1-Methylhistidine", "Caffeine")) ## Assign m/z and intensity values. spd$mz <- list( c(109.2, 124.2, 124.5, 170.16, 170.52), c(83.1, 96.12, 97.14, 109.14, 124.08, 125.1, 170.16), c(56.0494, 69.0447, 83.0603, 109.0395, 110.0712, 111.0551, 123.0429, 138.0662, 195.0876)) spd$intensity <- list( c(3.407, 47.494, 3.094, 100.0, 13.240), c(6.685, 4.381, 3.022, 16.708, 100.0, 4.565, 40.643), c(0.459, 2.585, 2.446, 0.508, 8.968, 0.524, 0.974, 100.0, 40.994)) sps <- Spectra(spd)
We next convert the Spectra
to a matchms.Spectrum
object.
pysps <- rspec_to_pyspec(sps) pysps
As a result we got now a Python list with 3 Spectrum objects containing the peak
data of sps
as well as a reduced set of the available spectra variables in
sps
. Which spectra variables get copied to Python can be defined with the
mapping
parameter of rspec_to_pyspec
. The default is to convert all
variables defined by spectraVariableMapping()
, but additional variables along
with their respective names in the Python Spectrum object can be defined
too. Below we list the pre-selected spectra variables that are converted by
default.
spectraVariableMapping()
With our Spectra
data converted to Python, we could directly call all routines
from the matchms
package using the reticulate R package. Below we normalize
the intensities of the 3 spectra using the normalize_intensities()
function
from the matchms.filtering library. We thus need to first import the
functionality from that package and can then call the function directly on the
(Python) objects.
library(reticulate) filters <- import("matchms.filtering") res <- vector("list", length(pysps)) for (i in (seq_along(pysps) - 1)) res[[i + 1]] <- filters$normalize_intensities(pysps[i]) res <- r_to_py(res) res
We can now convert the list of Python matchms.Spectrum
objects back to R with
pyspec_to_rspec()
:
sps_r <- pyspec_to_rspec(res) #' The normalized intensities intensity(sps_r) #' The original intensities intensity(sps)
Intensity values have now been normalized to values between 0 and 1.
Note however that, it would be much better (and likely more efficient) to directly use a Python function on the list of Spectrum objects instead of the mixed R and Python code used in the example above. Below we define a simple python script that iterates over the spectra in python and performs the normalization.
py_script <- paste0("from matchms.filtering import normalize_intensities\n", "for i in range(len(pysps)):\n", " pysps[i] = normalize_intensities(pysps[i])\n") py$pysps <- pysps py_run_string(py_script) tmp <- pyspec_to_rspec(pysps) intensity(tmp)
At very last we need also to stop the Python environment enabled by basilisk.
basiliskStop(cl)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.