Introduction

The moleculaR R package provides a computational framework that introduces probabilistic mapping and point-for-point statistical testing of metabolites in tissue via Mass spectrometry imaging. It enables collective projections of metabolites and consequently spatially-resolved investigation of ion milieus, lipid pathways or user-defined biomolecular ensembles within the same image.

moleculaR comes pre-loaded with the SwissLipids database and with is capable of importing metabolite annotation results from the METASPACE platform to compute FDR-verified moleculaR probability maps (MPMs) and collective projection probability maps (CPPMs).

For more information about this package and its applications please refer to the associated preprint.

Installation & Loading Example Data

moleculaR can be installed via devtools, note that you to set build_vignettes=TRUE to build this vignette during package installation. Once installed, moleculaR could be loaded with library():

# not run
install.packages("devtools")
devtools::install_github("CeMOS-Mannheim/moleculaR", build_vignettes=TRUE)

library(moleculaR)

Importing & Processing MSI Data

Under the hood moleculaR is mainly based on MALDIquant, spatstat and Matrix packages and provides some additional tools to help ipmport and process any given MSI dataset. Please note that currently only centroided MSI data is supported. moleculaR expects two mandatory data inputs; a (processed) imzML file and an additional tsv file which contains a full continuous spectrum (with m/z and intensity columns) which either represents a random pixel within the imaging dataset or an average spectrum.

#-- not run --#

# read MSI data  
imzmlFile         <- "pathToFile.imzML" 
msData            <- readCentrData(path = imzmlFile) 

# single spectrum  
spectrFile        <- "pathToFile.tsv"                  
msSpectr          <- readSingleSpect(spectrFile) 

moleculaR also accepts one optional input which is the annotations results for the given dataset which the user could retrieve from METASPACE (https://metaspace2020.eu) provided, of course, that the same dataset has been previously uploaded there for annotation. When this input is provided, moleculaR takes these into consideration to filter out detections which were not verified by METASPACE at a certain FDR specified by the user.

#-- not run --#

# load the metaspace annotations file
pathToMtspc       <- "pathToFile.csv"                 
mtspc             <- read.csv(file = pathToMtspc, skip = 2, header = TRUE, colClasses = "character")

Finally, the SwissLipid database, which is provided with moleculaR ,must be loaded:

#-- not run --#

# load the processed swisslipids db 
pathTosldb        <- system.file("extdata", "swisslipids-speciesOnly-sep2020.tsv", package = "moleculaR", mustWork = TRUE)
sldb              <- loadSwissDB(pathTosldb)

For subsequent analysis, an S3 fwhm object must be created which will hold information about full width at half maximum (FWHM) as a function of m/z axis and will be used to estimate FWHM at any given m/z value (for more info see ?estimateFwhm):

#-- not run --#

# estimate fwhm from msSpectr
fwhmObj           <- estimateFwhm(s = msSpectr) 

Before applying any preprocessing methods, it is highly recommended to perform peak-binning and peak filtering on msData, subsequently the user may apply any preprocessing methods provided by MALDIquant (keeping in mind that this is a centroided dataset).

#-- not run --#

# bin peaks
msData            <-  MALDIquant::binPeaks(msData, 
                                           tolerance = getFwhm(fwhmObj, 400)/400, #focusing on lipids 
                                           method = "relaxed")


# filter out peaks which occur in less than 1% of the time 
# Note: use 'moleculaR::' namespace to distinguish it from MALDIquant::filterPeaks if MALDIquant is loaded.
msData            <- moleculaR::filterPeaks(x = msData, minFreq = 0.01)

Walkthrough - Example Data

moleculaR also comes pre-loaded with an example MALDI MSI dataset which is showcased in the associated preprint mentioned above. To load the example data (assuming that the package has been already loaded):

library(moleculaR)

data("processed-example-Data")

# to see the loaded objects
ls()
#> [1] "fwhmObj" "msData"  "mtspc"   "sldb"

msData represents a centroided MALDI MSI dataset stored as a list of MassPeaks objects (see ?MALDIquant::MassPeaks for more details), fwhmObj is an S3 object of type fwhm storing the calculated full width at half maximum (FWHM) information of msData (see ?moleculaR::fwhm for more details), mtspc is a data frame storing the associated annotations file (≤0.2 FDR) which is downloaded directly from METASPACE (https://metaspace2020.eu) and sldb is a data frame storing the SwissLipids database filtered to only include the identifications outlined in mtspc.

Since moleculaR relies internally on spatial data, a spatial window can be created to represent the tissue bounderies:

spwin     <- createSpatialWindow(pixelCoords = MALDIquant::coordinates(msData), 
                                 clean = TRUE,
                                 plot = TRUE)

plot of chunk spWindow

To investigate FWHM as a function of m/z axis one could simply plot fwhmObj:

plot(fwhmObj)

plot of chunk plotFwhm

Or, to find the estimated FWHM at any given of m/z value one could simply pass fwhmObj to GetFwhm method:

# FWHM at m/z 400
getFwhm(fwhmObj, 400)
#> [1] 0.004822831

To speed up downstream analysis, moleculaR relies on a sparse matrix representation of the MSI data. To this end, msData has to be first converted to an S3 object of type moleculaR::sparseIntensityMatrix:

#// create sparse matrix representation
spData <- createSparseMat(x = msData)

Molecular Probability Maps (MPMs)

moleculaR introduces the idea of molecular probability maps (MPMs) the main goal of which is to reduce the reliance on user's subjective opinion on the extent of spatial distribution of analytes within a given tissue section. Instead MPMs provide a user-unbiased statistical testing on the likelihood that a certain spatial intensity distribution has a significant relative abundance (i.e. analyte hotspot) or deficiency (i.e. analyte coldspot) within the tissue space. To illustrate this, consider as an example a peak-of-interest (POI) of 788.5447 m/z.

# input by m/z value
queryMass         <- 788.5447 

moleculaR provides searchAnalyte method to retrieve inensity values at a specific m/z with a mass-window dictated by the estimated FWHM at that same m/z value (see ?moleculaR::searchAnalyte for more info). Moreover, searchAnalyte gives the user the possibility to choose which weighting method to choose, for example, setting the argument wMethod="sum" will generate the so called "ion image" as all peaks appearing within the estimated mass-window will be summed up (i.e. uniform mass-window weighting):

# compute the regular ion image - returns an AnalytePiontPattern
sppIonImage      <- searchAnalyte(m = queryMass, 
                                  fwhm = getFwhm(fwhmObj, queryMass), 
                                  spData = spData, 
                                  spwin = spwin, 
                                  wMethod = "sum")


# compute a raster image of the sppIonImage 
ionImage        <- spp2im(sppIonImage)

# plot ion image
plotImg(ionImage)

plot of chunk ionImage

Now to compute a FWHM-dependent Gaussian weighted analyte point pattern representation, set wMethod="Gaussian":

# compute spatial point pattern of the analyte
sppMoi          <- searchAnalyte(m = queryMass, 
                            fwhm = getFwhm(fwhmObj, queryMass), 
                            spData = spData,
                            spwin = spwin, 
                            wMethod = "Gaussian")

# plot SPP
plotAnalyte(sppMoi, main = paste0("SPP of m/z ", round(queryMass, 4)))

plot of chunk sppmoi

# plot the corresponding raster image
plotImg(spp2im(sppMoi), main = paste0("Raster image of m/z ", round(queryMass, 4)))

plot of chunk sppmoi

The probMap could then be called with the default parameters to calculate the corresponding MPM of the above POI (see ?probMap for details) and the generic plot could be used on the result to plot a coposite detailed illustration (see ?plot.molProbMap for details):

#// compute MPM - default parameters
probImg         <- probMap(sppMoi)


#// plot everything together
par(cex.lab = 2, cex.main = 2, cex.axis = 1.5)
plot(probImg, what = "detailed",  ionImage = ionImage)

plot of chunk MPM

Collective Projections Probability Maps (CPPMs)

Another concept introduced by moleculaR is the collective projections probability maps (CPPMs) which, as the name implies, provides a framework for visualization of a set of analytes collectively in a single image space. This could be of interest when, for example, a user is interested in visualization a set of analaytes with a certain similarity (structure, functionality, etc.). For more information on how this is achieved, please refer to the associated preprint.

CPPMs - Lipid Classes

As moleculaR is lipidome-focused in its current implementation, the test MSI dataset is screened against the internal instance of the SwissLipids database and is optionally verified by an externally provided lipidome annotation file downloaded from METASPACE for the given MSI dataset (pre-loaded for the current example). The batchLipidSearch method then is used to do this screening (for more details see ?batchLipidSearch):

cat("Batch lipid search is ongoing - this will take several minutes - \n")

lipidHits <- batchLipidSearch(spData = spData, fwhmObj = fwhmObj, sldb = sldb,
                              spwin = spwin,
                               adduct = c("M+H", "M+Na", "M+K"),
                               numCores = 4, 
                               verifiedMasses = as.numeric(mtspc$mz),
                               confirmedOnly = TRUE, verbose = TRUE)
#> Warning in batchLipidSearch(spData = spData, fwhmObj = fwhmObj, sldb = sldb, : Only single-core operation is supported on windows.

lipidHits

The results is an S3 object of type spatstat::ppp and moleculaR::analytePointPattern which contains all lipid identifications (against SwissLipids database) according to the provided mtspc annotation file. The metaData of the detected analytes are found in the metaData slot:

# show metaData 
head(lipidHits$metaData)
#>             idx   mzVals mzConfirmed     mode adduct       lipidID sumformula    abbrev
#> M+H  1698643086 327.1567        TRUE positive    M+H SLM:000055220  C13H25O7P LPA(10:0)
#> M+H1 1698565198 299.1254        TRUE positive    M+H SLM:000055283  C11H21O7P  LPA(8:0)
#> M+H2 1698652090 496.3398        TRUE positive    M+H SLM:000055318 C24H50NO7P LPC(16:0)
#> M+Na 1698637970 518.3217        TRUE positive   M+Na SLM:000055318 C24H50NO7P LPC(16:0)
#> M+H3 1698393130 494.3241        TRUE positive    M+H SLM:000055319 C24H48NO7P LPC(16:1)
#> M+H4 1698161795 524.3711        TRUE positive    M+H SLM:000055322 C26H54NO7P LPC(18:0)
#>      numDoubleBonds lipidClass chainLength
#> M+H               0   LPA(x:x)          10
#> M+H1              0   LPA(x:x)           8
#> M+H2              0   LPC(x:x)          16
#> M+Na              0   LPC(x:x)          16
#> M+H3              1   LPC(x:x)          16
#> M+H4              0   LPC(x:x)          18

To list all lipid classes which were detected (and confirmed by METASPACE; if confirmedOnly=TRUE):

# whow all detected lipid classes
unique(lipidHits$metaData$lipidClass)
#>  [1] "LPA(x:x)"        "LPC(x:x)"        "LPC(O-x:x)"      "LPE(x:x)"        "LPS(x:x)"       
#>  [6] "PA(x:x)"         "PA(O-x:x)"       "PC(x:x)"         "PC(O-x:x)"       "PE(x:x)"        
#> [11] "PE(O-x:x)"       "PG(x:x)"         "PS(x:x)"         "PS(O-x:x)"       "TG(x:x)"        
#> [16] "HexCer(tx:x)"    "HexCer(dx:x)"    "Hex2Cer(dx:x)"   "SM(dx:x)"        "PE-Cer(dx:x)"   
#> [21] "PGP(x:x)"        "BMP(x:x) | LBPA" "SE(x:x)"         "PG(O-x:x)"       "Cer(tx:x)"      
#> [26] "TG(O-x:x)"       "LPG(O-x:x)"

To generate class-specific lipid maps, subsetAnalytes could be used to filter lipidHits according to lipid classes. Note that subsetAnalytes subsetting is always based on the column names of lipidHits$metaData. Afterwards one could directly apply probMap method and produce a CPPM for all hit instances for a given lipid class. Note that for CPPMs it is highly recommended to apply z-score transformation to account for differences in inonization effeciency of the MOIs constituting the CPPM:

# z-score transformation
lipidHits <- transformIntensity(lipidHits, method = "z-score")

# choose one lipid species
lipidClass <- "PA(x:x)"

# subset lipidHits
paHits <- subsetAnalytes(lipidHits, lipidClass == "PA(x:x)")


# compute MPM - default parameters
probImg    <- probMap(paHits)


#// plot everything together
par(cex.lab = 2, cex.main = 2, cex.axis = 1.5)
plot(probImg, what = "detailed")

plot of chunk CPPM.plot

CPPMs - Ion Milieus

Same as with lipid class maps, one could create representations of a custom set of analytes. Suppose that ion milieus distribution of all (lyso)glycerophospholipids '(lyso)GPLs' which are the mian constituent of biological membraes. One could first subset lipidHits to include PA, PS, PE, PC, PI, PG, LPA, LPS, LPE, LPC, LPI, and LPG:

# create a subset representing lysoGPLs
ofInterest <- c("LPA(x:x)", "LPC(x:x)", "LPE(x:x)", "LPG(x:x)","LPI(x:x)", "LPS(x:x)", 
                "PA(x:x)", "PC(x:x)", "PE(x:x)","PG(x:x)", "PI(x:x)", "PS(x:x)")

# subset lipidHits
lysoGPLs <- subsetAnalytes(lipidHits, lipidClass %in% ofInterest)

lysoGPLs
#> Marked planar point pattern: 5794229 points
#> Mark variables: idx, intensity 
#> window: polygonal boundary
#> enclosing rectangle: [108.45, 305.55] x [139.45, 314.55] units

# detected classes
cat("detected lipid classes: \n")
#> detected lipid classes:
unique(lysoGPLs$metaData$lipidClass)
#> [1] "LPA(x:x)" "LPC(x:x)" "LPE(x:x)" "LPS(x:x)" "PA(x:x)"  "PC(x:x)"  "PE(x:x)"  "PG(x:x)" 
#> [9] "PS(x:x)"

Then one could further use subsetAnalytes to subset lysoGPLs according to any analyte characterestic of the metaData table i.e.

# meta data table columns
colnames(lysoGPLs$metaData)
#>  [1] "idx"            "mzVals"         "mzConfirmed"    "mode"           "adduct"        
#>  [6] "lipidID"        "sumformula"     "abbrev"         "numDoubleBonds" "lipidClass"    
#> [11] "chainLength"

To this end, one could visualize for example all [M+K]+ adducts of detected (lyso)GPLs:

# detected adducts
cat("detected adducts: ")
#> detected adducts:
table(lysoGPLs$metaData$adduct)
#> 
#>  M+H  M+K M+Na 
#>  134   26   99


# subsetting
kHits           <- subsetAnalytes(lysoGPLs, adduct == "M+K")
# or NHits      <- subsetAnalytes(lysoGPLs, adduct == "M+Na")
# or HHits      <- subsetAnalytes(lysoGPLs, adduct == "M+H")

To apply CPPMs and plot the result for [M+K]+:

# compute CPPM
probImg    = probMap(kHits)

if(probImg$sppMoi$n > 50000) {
  cat("plotting ", format(probImg$sppMoi$n, big.mark = ","), " points - this takes time! \n")
}
par(cex.lab = 2, cex.main = 2, cex.axis = 1.5)

plot(probImg, what = "detailed")

plot of chunk ions.plot.K

CPPMs - Lipid Saturation

The same technique could applied to investigate lyso(GPLs) saturation (saturated, mono-unsaturated, di-unsaturated and poly-unsaturated) within one image space by subsetting based on the number of double-bonds of the lipid fatty acid chain.

detectedSaturation <- c("sat", "mono-unsat", "di-unsat", "poly-unsat")


# subsetting
satHits             <- subsetAnalytes(lysoGPLs, numDoubleBonds == 0)
# or monoHits       <- subsetAnalytes(lysoGPLs, numDoubleBonds == 1)
# or diHits         <- subsetAnalytes(lysoGPLs, numDoubleBonds == 2)
# or polyHits       <- subsetAnalytes(lysoGPLs, numDoubleBonds > 2)

input for the specific saturation

# compute CPPM
probImg    = probMap(satHits)
#> Attempting to generate 1408331 random points

if(probImg$sppMoi$n > 50000) {
  cat("plotting ", format(probImg$sppMoi$n, big.mark = ","), " points - this takes time! \n")
}
par(cex.lab = 2, cex.main = 2, cex.axis = 1.5)
plot(probImg, what = "detailed")

plot of chunk cppms.unsat.plot

Session Information

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_Germany.65001  LC_CTYPE=English_Germany.1252    
#> [3] LC_MONETARY=English_Germany.65001 LC_NUMERIC=C                     
#> [5] LC_TIME=English_Germany.65001    
#> system code page: 65001
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#> [1] moleculaR_0.9.0 rmarkdown_2.11  devtools_2.4.2  usethis_2.1.3  
#> 
#> loaded via a namespace (and not attached):
#>  [1] nlme_3.1-153             fs_1.5.0                 spatstat.sparse_2.0-0   
#>  [4] rprojroot_2.0.2          bslib_0.3.1              tools_4.0.2             
#>  [7] utf8_1.2.2               R6_2.5.1                 rpart_4.1-15            
#> [10] sm_2.2-5.7.1             mgcv_1.8-38              colorspace_2.0-2        
#> [13] raster_3.5-2             withr_2.4.2              sp_1.4-5                
#> [16] gridExtra_2.3            prettyunits_1.1.1        processx_3.5.2          
#> [19] compiler_4.0.2           cli_3.1.0                readBrukerFlexData_1.8.5
#> [22] desc_1.4.0               sass_0.4.0               scales_1.1.1            
#> [25] spatstat.data_2.1-0      callr_3.7.0              stringr_1.4.0           
#> [28] spatstat_2.3-0           goftest_1.2-3            digest_0.6.28           
#> [31] spatstat.utils_2.2-0     vioplot_0.4.0            base64enc_0.1-3         
#> [34] pkgconfig_2.0.3          htmltools_0.5.2          MALDIquantForeign_0.12  
#> [37] sessioninfo_1.2.1        highr_0.9                fastmap_1.1.0           
#> [40] import_1.2.0             rlang_0.4.12             rstudioapi_0.13         
#> [43] shiny_1.7.1              jquerylib_0.1.4          jsonlite_1.7.2          
#> [46] zoo_1.8-11               magrittr_2.0.1           spatstat.linnet_2.3-0   
#> [49] MALDIquant_1.20          Matrix_1.3-4             Rcpp_1.0.7              
#> [52] munsell_0.5.0            fansi_0.5.0              abind_1.4-5             
#> [55] viridis_0.6.2            lifecycle_1.0.1          terra_1.4-11            
#> [58] stringi_1.7.5            yaml_2.2.1               pkgbuild_1.2.0          
#> [61] readMzXmlData_2.8.1      grid_4.0.2               parallel_4.0.2          
#> [64] promises_1.2.0.1         crayon_1.4.2             deldir_1.0-6            
#> [67] lattice_0.20-45          splines_4.0.2            tensor_1.5              
#> [70] knitr_1.36               ps_1.6.0                 pillar_1.6.4            
#> [73] spatstat.geom_2.3-0      codetools_0.2-16         pkgload_1.2.3           
#> [76] XML_3.99-0.8             glue_1.5.0               evaluate_0.14           
#> [79] remotes_2.4.1            renv_0.13.0              vctrs_0.3.8             
#> [82] httpuv_1.6.3             testthat_3.1.0           gtable_0.3.0            
#> [85] spatstat.core_2.3-0      purrr_0.3.4              polyclip_1.10-0         
#> [88] cachem_1.0.6             ggplot2_3.3.5            xfun_0.28               
#> [91] mime_0.12                xtable_1.8-4             later_1.3.0             
#> [94] viridisLite_0.4.0        tibble_3.1.6             shinythemes_1.2.0       
#> [97] memoise_2.0.0            shinyWidgets_0.6.2       ellipsis_0.3.2

moleculaR Shiny Apps

moleculaR provides two R Shiny web apps with intuitive web-based GUIs; a example-app which comes pre-loaded with an examplary reduced MALDI MSI dataset (for more info refer to this (preprint)[https://doi.org/10.1101/2021.10.27.466114]) and is used for demonstration purposes only and a package-app which lets the user upload her own centroided imzML data and apply spatial probability mapping through Molecular Probability Maps (MPMs) and Collective Projection Probability Maps (CPPMs).

Loading Data

This step only applies to package-app. The first step using the shiny package-app, would be to upload you own data. To do this click on imzML & ibd File and navigate to the directory containing the files you want to upload. Then select both imzML- and ibd-Files and upload them. Via Spectrum .tsv File uPload a continuous spectrum in tsv format which would represent either a single random pixel or a mean spectrum of your imaging dataset. Finally, upload the METASPACE annotation results (optional) via Metaspace.csv File which should be in a csv format.

Loading the Data

After successfully loading the data the message "Upload complete" should appear.

Data Loaded

When all files are fully loaded you can proceed and use the Peaks FWHM Estimation. To do this press Initialize as shown in the figure below.

Peaks FWHM Estimation

A progress bar will appear at the bottom right of the app.

Progress Bar

When the data has been processed and FWHM has been estimated, a plot will appear showing the approximation.

FWHM Estimation Completed{width=70%}

MPMs

To compute molecular probability maps (MPM), one could enter an exact mass. By default the closest m/z-value found in the dataset will be chosen. If you are looking for an exact mass, deactivate the checkbox shown in the figure below.

![Closest m/z in Dataset](shiny-imgs/closest mz.PNG)

After entering the desired mass press Generate Plot and wait until the plots are generated.

Molecular Probability Maps (MPM){width=70%}

CPPMs

Currently three different Collective Projection probability Maps (CPPMs) are supported; Lipid Classes, Ion Milieus and Lipid Saturation.

Lipid Classes

To generate CPPMs of specific lipid classes one can choose a lipid from the drop-down menu before pressing "Generate Plot".

![Lipid Classes Generation](shiny-imgs/cpm lipid classes generate.PNG)

After the generation is completed, plot similar to the one below is displayed.

![Lipid Classes Output](shiny-imgs/cpm lipid classes generated.PNG){width=70%}

Ion Milieus

For Ion Milleu generation one can choose between all, M+K, M+Na and M+H from the drop-down menu before pressing "Generate Plot".

![Ion Milleu Generation](shiny-imgs/cpm ion generate.PNG)

After the generation is completed, plot similar to the one below is displayed.

![Ion Milleu Output](shiny-imgs/cpm ion generated.PNG){width=70%}

Lipid Saturation

For Lipid Saturation generation one can choose between all, sat., mono., di. and poly. from the dropdown menu before pressing "Generate Plot".

![Lipid Saturation Generation](shiny-imgs/cpm lipid sat generate.PNG)

After the generation is completed, plot similar to the one below is displayed.

![Lipid Saturation Output](shiny-imgs/cpm lipid sat generated.PNG){width=70%}



CeMOS-Mannheim/moleculaR documentation built on April 14, 2025, 8:27 a.m.