knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
mWISE (metabolomics Wise Inference of Speck Entities) is an R package that provides tools for context-based annotation of untargeted LC-MS data. Several computational strategies have been proposed to overcome untargeted LC-MS data annotation, which is still considered a major bottleneck.
mWISE integrates several strategies to provide a fast annotation of peak-intensity tables. It consists of three main steps aimed at i) matching mass-to-charge ratio values to KEGG database, ii) clustering and filtering the potential KEGG candidates and iii) building a final prioritized list using diffusion in networks.
mWISE R package provides individual functions to perform each of the steps, as well as a wrapper function to easily conduct the whole annotation pipeline. In here, an overview of all the possibilities offered by mWISE is shown.
mWISE uses adducts and in source fragments knowledge to perform a fast matching to KEGG database. The default table of adducts and fragments is built using information from CAMERA R package, H. Tong et al., and cliqueMS.
The table used to perform the maching stage (Cpd.Add) is built 
using KEGG database knowledge. Below, a data frame containing KEGG 
identifiers and their exact masses is shown.
library(mWISE) data("KeggDB") head(KeggDB)
The Cpd.Add table is built from the Info.Add table, shown below. 
The column named quasi indicates which adducts or fragments are 
considered quasi-moleculars and the columns named log10freq and 
Freq indicate the observed frequencies of the adducts and fragments 
available in CliqueMS R package. 
data("Info.Add") head(Info.Add)
The function CpdaddPreparation allows to compute the Cpd.Add table. 
It eases the addition of new adducts or fragments not available in mWISE, 
by providing a table with at least the name, the number of molecules, 
the charge and the mass difference of each new adduct. 
Below, a reduced Cpd.Add table is built only using 2000 KEGG identifiers.
data("sample.keggDB") Cpd.Add <- CpdaddPreparation(KeggDB = sample.keggDB, do.Par = FALSE) head(Cpd.Add)
Here below, the different steps of mWISE will be shown.
An untargeted LC-MS example dataset is available in mWISE. 
The Trypanosoma dataset is organized as a list with the positive 
acquisiton mode objects and another list with the negative acquisition 
mode objects. Each list contains a slot with the Input and the 
Output objects. The Input data frame contains the peak-intensity 
matrix and the output data frame contains the reference peaks identified.
data("sample.dataset") Peak.List <- sample.dataset$Negative$Input df.Ref <- sample.dataset$Negative$Output df.Ref <- df.Ref[df.Ref$Peak.Id %in% Peak.List$Peak.Id,]
Once the example dataset is loaded, the matchingStage function can 
be applied. If the Cpd.Add argument is not specified, the default 
table will be used. In this case, the function is applied with all 
its arguments as default. The result consists of a list with the 
input peak-intensity matrix (Peak.List) and a table containing the 
resulting annotated table (Peak.Cpd). 
Annotated.List <- matchingStage(Peak.List = Peak.List, Cpd.Add = Cpd.Add, polarity = "negative", do.Par = FALSE) Annotated.Tab <- Annotated.List$Peak.Cpd nrow(Annotated.Tab)
A subset of the adducts or fragments available in mWISE can be 
selected for the matching stage. This is strongly recommended, 
since the expertise of the users with the experimental settings 
of their studies may highly improve the final annotation results.
The function printAdducts eases its selection, as it can be 
seen here below. 
printAdducts(pol = "negative") selectedAdds <- printAdducts(pol = "negative")[c(4:5,17,18,22:25,75,76)] selectedAdds
The new selection of adducts can be easily introduced in the 
matchingStage function through the Add.List parameter.
Annotated.List <- matchingStage(Peak.List = Peak.List, Cpd.Add = Cpd.Add, polarity = "negative", Add.List = selectedAdds, do.Par = FALSE) nrow(Annotated.List$Peak.Cpd)
It can be seen that the number of proposed candidates is highly reduced, which improves the next steps' accuracy.
First, the features that may come from the same metabolite are 
clustered using the featuresClustering function. In the 
Intensity.idx parameter, a vector with the index of the 
columns containing intensity information must be introduced.
Then, the result is merged to the annotated table and the 
different clusters are indicated in a column named pcgroup. 
Intensity.idx <- seq(27,38) clustered <- featuresClustering(Peak.List = Peak.List, Intensity.idx = Intensity.idx, do.Par = FALSE) Annotated.Tab <- merge(Annotated.Tab, clustered$Peak.List[,c("Peak.Id", "pcgroup")], by = "Peak.Id")
The object containing the potential candidates that result from 
the matching stage and the clustering of the peaks is introduced 
in the clusterBased.filter function. A list of characters indicating 
quasi-molecular adducts can be introduced in the parameter Add.Id. 
If not, the quasi-molecular adducts available in mWISE, together with 
the adducts with an observed frequency higher than 0.1 will be used 
for filtering. The user can modify the minimum observed frequency 
using the Freq argument. 
MH.Tab <- clusterBased.filter(df = Annotated.Tab, polarity = "negative")
For the diffusion stage, we will now use the sample graph provided by FELLA R package.
data("sample.graph") g.metab <- igraph::as.undirected(sample.graph)
The different diffusion inputs can be computed using the 
diffusion.input function. The input.type argument can be 
set to probability or binary. If Unique.Annotation = TRUE, 
the diffusion input will be computed only using the peaks with 
a unique candidate. 
Input.diffusion <- diffusion.input(df = MH.Tab, input.type = "probability", Unique.Annotation = FALSE, do.Par = FALSE)
The set.diffusion function applies diffusion in graphs using 
the input previously built. The z score normalizes the diffusion 
scores by taking into account the topology of the graph. On the 
other hand, when scores = raw, no normalization is applied. 
diff.Cpd <- set.diffusion(df = Input.diffusion, scores = "z", graph = g.metab, do.Par = FALSE) Diffusion.Results <- diff.Cpd$Diffusion.Results
The recoveringPeaks function recovers the peaks that have been 
completely removed by the cluster-based filter.
MH.Tab <- recoveringPeaks(Annotated.Tab = Annotated.Tab, MH.Tab = MH.Tab) Diff.Tab <- merge(x = MH.Tab, y = Diffusion.Results, by = "Compound", all.x = TRUE)
Finally, the diffusion prioritized table is built using the 
finalResults function. 
The modifiedTabs prepare the tables that result from the matching 
stage and the filtering stage, by grouping those peaks where the same
candidate has been proposed more than once. This can happen when a 
compound can result in the same mass-to-charge ratio through more 
than one adduct or fragment. 
Ranked.Tab <- finalResults(Diff.Tab = Diff.Tab, score = "z", do.Par = FALSE) Annotated.Tab2 <- modifiedTabs(df = Annotated.Tab, do.Par = FALSE) MH.Tab2 <- modifiedTabs(df = MH.Tab, do.Par = FALSE) Annotated.dataset <- list(Annotated.Tab = Annotated.Tab2, Clustered.Tab = clustered, MH.Tab = MH.Tab2, Diff.Tab = Diff.Tab, Ranked.Tab = Ranked.Tab)
The wrapper function mWISE.annotation applies the whole mWISE 
pipeline at once. 
Annotated.List <- mWISE.annotation(Peak.List = Peak.List, polarity = "negative", diffusion.input.type = "binary", score = "raw", Cpd.Add = Cpd.Add, graph = g.metab, Unique.Annotation = TRUE, Intensity.idx = Intensity.idx, do.Par = FALSE)
Finally, the performanceEvaluation function can be used to compute 
the performance metrics, using the benchmark data frame df.Ref. 
The argument top.cmps defines the top K candidates considered 
for the evaluation. 
performanceEvaluation(Annotated.dataset = Annotated.dataset, df.Ref = df.Ref, top.cmps = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.