In sipss/AlpsLCMS: Signal processing of LC-MS metabolomics

knitr::opts_chunk$set(
  fig.width = 7,
  fig.height = 5,
  collapse = TRUE,
  comment = "#>"
)

The AlpsLCMS package was written with two purposes in mind:

to help data analysts and LC-MS scientists to work with LC-MS samples.
to help IT pipeline builders implement automated methods for preprocessing.

Functions from this package written for data analysts and LC-MS scientists are prefixed with lcms_, while higher level functions written for IT pipeline builders are prefixed with pipe_. The main reason why all exported functions have a prefix is to make it easy for the user to discover the functions from the package. By typing lcms_ RStudio will return the list of exported functions. In the R terminal, lcms_ followed by the tab key (⇥) twice will have the same effect. Other popular packages, follow similar approaches (e.g: forcats: fct_*, stringr: str_*).

This vignette is written for the first group. It assumes some prior basic knowledge of LC-MS and data analysis, as well as some basic R programming. In case you are interested in building pipelines with this package, you may want to open the file saved in this directory (run it on your computer):

pipeline_example <- system.file("pipeline-rmd", "pipeline_example.R", package = "AlpsLCMS")
print(pipeline_example)

library(AlpsLCMS)

NODE 1: Data wrangling

Convert RAW to mzXML

First, we need the listed spectra ideally in ".mzXML" or ".cdf" formats to create the lcms_dataset. We can manually convert ".raw" into ".mzXML" using RawConverter or ProteoWizard externally and saved them within the same input directory. In ".cdf" files, we need to set the polarity manually.

Input

input_dir <- "C:/Users/hgracia/Desktop/IBEC/data/positive"
path <- input_dir
# The format of LC-MS files to list (or list and convert).
file_format <- "mzXML"

# Directory where the RawConverter is.
# We recommend converting files outside R and set this to NULL
rawconverter <- "C:/Users/hgrac/AppData/Local/Apps/2.0/0W3LE4AA.3V9/HAK1HQ15.LLV/rawc..tion_0000000000000000_0001.0001_1563e82fa65e7a77/RawConverter.exe"
#rawconverter <- "C:/Users/fmadrid/Documents/IBEC/NESTLE/LCMS/RawConverter_x64/RawConverter.exe"

Code to run: list of samples

samples_mzxml <- lcms_list_mzxml_samples(path, file_format = file_format,
                                    rawconverter = rawconverter)
samples_mzxml = as.character(samples_mzxml)

Code to run (II): creating the dataset object

# Be careful setting the mode to "onDisk" when you apply this function.
dataset <- suppressWarnings(lcms_read_samples(samples_mzxml, mode = "onDisk"))
message("Is the dataset correctly loaded? \n")
cat("\n")
print(dataset)

NODE 2: Append metadata

To merge the metadata, an Excel o CSV file is required, containing the first column (called "sampleNames") with the same name of the LC-MS files, ending with the format (e.g. Sample1.mzXML).

It requires a column (called "treatment") with the class sample. Ensure you have this specific colname "treatment".

Caution with metadata. The use of characters such as "-", "/", " " (space) and starting with numbers, etc. leads to problems. Therefore, the function replace [\\\"\\s/\\\\,;.:|#@$%&?!*%+-=><^'(){}\\[\\]]+ by _. Beware of using other special characters and change them by usual ASCII characters.

Input parameters

# Create a metadata for this example dataset
metadata <- readxl::read_excel("C:/Users/hgracia/Desktop/IBEC/data/20200128_Sample_Info_adapted.xlsx")

Code to run

dataset <- lcms_meta_add(dataset, metadata, by = "sampleNames")
phData(dataset)

NODE 3: Total ion chromatogram

tics <- lcms_tics(dataset, treatment = "treatment")

lcms_plot_tics(tics,
               treatment = treatment, 
               plot_type = "spec")

lcms_plot_tics(tics, treatment = treatment,
               plot_type = "boxplot")

NODE 4: Filtering

Filter by retention time / m/z

For coherence with the pipeline, time is measured in minutes. XCMS and Autotuner packages work in seconds by default, while CAMERA in minutes. Under the hood, the still do in this way, but we preset all our results in minutes.

Input

# Range of the retention time (minutes) to include in further analyses
rt = c(1.6, 15.5)
ms = c(350, 900)

#rt = c(50, 60)
#ms = c(200, 500)

Code to run

dataset_shorter <- lcms_filter_rt_min(dataset, rt = rt)
dataset_shorter <- lcms_filter_mz(dataset_shorter, mz = ms)
tics <- lcms_tics(dataset_shorter, treatment = "treatment")

lcms_plot_tics(tics,
               treatment = treatment, 
               plot_type = "spec")

NODE 5: Optimization of parameters: IPO

TODO: Añadir autotune o lo que sea para optimizar parametros

Input

prep_parm_p <- NULL
prep_parm_p$peakwidth <- c(20, 80)
prep_parm_p$noise <- 5000
prep_parm_p$snthresh <- 3
prep_parm_p$prefilter <- c(6, 5000)
prep_parm_p$centerSample <- "wMean"
prep_parm_p$integrate <- 2
prep_parm_p$mzdiff <- -0.001
prep_parm_p$profStep <- 0.005
prep_parm_p$minFraction <- 0.2
prep_parm_p$ppm <- 25
prep_parm_p$mzCenterFun <- "wMean"
prep_parm_p$fitgauss <- FALSE
prep_parm_p$verbose.columns <- FALSE

classes <- dataset@phenoData@data[["treatment"]]

NODE 6: Preprocessing with xcms

Peak detection

peakdet = find_peaks_cwp(dataset, 
                                    params = prep_parm_p)

message("Number of detected peaks")
peakdet@msFeatureData[["chromPeakData"]]@nrows
message("")

message("Parameters")
peakdet@.processHistory[[1]]@param

xcms::plotChromPeakImage(peakdet)

Retention Time Correction and Peak Correspondence

Peak correspondence is carried out by the 'groupPeaks' method, with parameters obtained form IPO. Peak Correspondence consist in grouping peaks on retention time axis with the purpose of associate them to spectra on the mass/charge axis. After this stage we finally have a peak table.

It requires a previous grouping.

Correspondence

new_params <- PeakDensityPar(sampleGroups = classes, 
                               binSize = 0.6)

peakgrouped <- groupPeaks(peakdet, 
                         param = new_params)

message("Number of detected peaks")
peakdet@msFeatureData[["chromPeakData"]]@nrows

Alignment and regrouping

pgp <- PeakGroupsPar(minFraction = 0.8,
                       extraPeaks = 1, 
                       smooth = "loess",
                       span = 0.4,
                       family = "gaussian")

## Get the peak groups that would be used for alignment.
xdata_aling <- adjustRT(peakgrouped, param = pgp)

rt_plot = lcms_retention_time_alignment_plot(xdata_aling)
rt_plot

## REGROUPING
new_params <- PeakDensityPar(sampleGroups = classes,
                             bw = 30, 
                             minFraction = 0.4)

peakgrouped = groupPeaks(xdata_aling, param = new_params)

Image plot of the chromatographic detected peaks per sample before retention time alignment and grouping:

lcms_plot_chrom_peak_image(peakdet, binSize = 5,
                           xlim = NULL,
                           log = FALSE,
                           xlab = "retention time (min)",
                           yaxt = par("yaxt"),
                           main = "Detected Peaks (unprocessed)")

lcms_plot_chrom_peak_image(peakgrouped, binSize = 5,
                           xlim = NULL,
                           log = FALSE,
                           xlab = "retention time (min)",
                           yaxt = par("yaxt"),
                           main = "Detected Peaks (processed)")

Imputation

Finally, in the imputation stage, we integrate the areas of the missing peaks of the peak table that were not detected in the previous steps of the signal preprocessing workflow. This stage is important to make easier statistical and machine learnig posterior stages.

message("Missing values found in the processed dataset: ", sum(is.na(featureValues(peakgrouped))))

peakgrouped_imp <- lcms_fill_chrom_peaks(peakgrouped)
cat("Imputing values...\n")

message("Missing values found after fill_chrom_peaks: ", sum(is.na(featureValues(peakgrouped_imp))))

NODE 8: Feature table

The function feature_values creates a intensity matrix with all the features. However, since this is untargeted metabolomics, the colnames are composed by FT1, FT2, FT3... (FT = feature) and each feature needs to be explored with the feature definition function (info for mass and rt) once a feature is significantly different by groups.

Merging into a Feature Table

xdata = feature_values(peakgrouped_imp,
                      method = "maxint",
                      value = "into",
                      filled = TRUE, 
                      missing = "rowmin_half")
xdata <-  t(xdata)
feature <- featureDefinitions(peakgrouped_imp)
feature <- feature@listData
featNames <- paste0(feature$mzmed,"_",feature$rtmed)
colnames(xdata) <- featNames

message("Missing values in the feature table: ",
sum(is.na(xdata)))

# Get mz and rt columns for the feature table
mz <- colnames(xdata) %>%
  stringr::str_split(.,"\\_") %>% 
  lapply(.,function(x) x[1]) %>% 
  unlist() %>% 
  as.numeric()

#You can get rt also
rt <- colnames(xdata) %>%
  stringr::str_split(.,"\\_") %>% 
  lapply(.,function(x) x[2]) %>% 
  unlist() %>% 
  as.numeric()
rt <- rt/60

NODE 9: Data reduction

Input

st <- getRamSt(peakgrouped_imp)
sr <- 0.6

#List of adducts for do.findmain
#adducts_list = c("[M+H-H2O]+")
adducts_list = c()

## Building the defineExperiment manually
## Change for your convenience (e.g. GC-MS)
value <- c(rep("fill", 4), "LC-MS")
design <- as.data.frame(value)
rownme <- c("Experiment", "Species", "Sample",
            "Contributer", "platform")  
rownames(design) <- rownme

value <- c(rep("fill", 13), "1")
instrument <- as.data.frame(value)
rownm <- c("chrominst", "msinst", "column", 
           "solvA", "solvB", "CE1", "CE2", 
           "mstype", "msmode", "ionization", 
           "colgas", "msscanrange", "conevol", 
           "MSlevs")
rownames(instrument) <- rownm

Experiment <- list(design =  design, instrument = instrument)

Code to run

RC <- clustering(xcmsObj = peakgrouped_imp,
                 featdelim = ".",
                 st = st,
                 sr = sr,
                 ExpDes = Experiment,
                 normalize = "TIC",
                 deepSplit = TRUE,
                 sampNameCol = 1,
                 mspout = FALSE,
                 fftempdir = getwd())

RC <- do_findmain(RC,
                  nls = c("[M+H-H2O]+"),
                  mode = "positive",
                  mzabs.error = 0.01,
                  ppm.error = 10,
                  plot.findmain = FALSE,
                  writeMat = FALSE,
                  writeMS = FALSE)

#labeled_adducts <- labelling(RC)
#representative_ions <- labeled_adducts$representative_ions
#xdata_reduced <- feature_reduction(xdata, representative_ions, RC)

reduced_object  <- feature_reduction(RC, xdata, loaded = FALSE)
xdata_reduced <- reduced_object$xdata_reduced

dim(xdata)
dim(xdata_reduced)

NODE 10: Data analysis

TODO: adaptar bootstrap de alpsnmr a alpslcms

Bootstrap and permutation

y_column = dataset_shorter@phenoData@data[["treatment"]]
#TODO da un error al calcular las performances de los modelos, pero es por el numero de muestras tan pequeño
#TODO probar con el dataset positive de nestle
#TODO intentar que el perf no use folds
# y_column = dataset@phenoData@data[["treatment"]]
# dataset <- xdata_reduced
# train_index = c(1,2,3,7,8,9)
# ncomp = 3
# nbootstrap = 10
# bp_results <- bp_VIP_analysis(xdata_reduced,
#                             train_index = c(1,2,3,7,8,9),
#                             y_column,
#                             ncomp = 3,
#                             nbootstrap = 10)


# # Bootstrap and permutation method in k-fold cross validation
# bp_results <-
#     bp_kfold_VIP_analysis(xdata_reduced, # Data to be analized
#                           y_column = y_column, # list of labels that indicates the class
#                           k = 4,
#                           ncomp = 2,
#                           nbootstrap = 30)
# #bp_results$vip_score_plot

Univariante analysis

# stat <- function(x){stats::wilcox.test(x ~ classes, xdata_reduced)$p.value}
# abcd <- data.frame(apply(FUN = stat,
#                          MARGIN = 2,
#                          X = xdata_reduced))
# colnames(abcd) <- c("p_Wilc")
# abcd$id <- row.names(abcd)
# result <- abcd
# summary(result$p_Wilc)
# message("\nNumber of features < 0.05 nominal p-value ", 
# sum(result$p_Wilc < 0.05))
# head(result[result$p_Wilc < 0.05, 1])
# 
# #FDR
# fdr.wilcox <- stats::p.adjust(result$p_Wilc, method = "fdr")
# result <- cbind(result, fdr.wilcox)
# message("\nNumber of features fdr-corrected p value of < 0.05 is ", 
# sum(result$fdr.wilcox < 0.05))

Univariate anotation

# # We use the selected vips
# univ_feat <- result[result$fdr.wilcox < 0.05,"id"]
# 
# # Untargeted assignation
# # Creating mz column
# mzr <- univ_feat %>%
#   stringr::str_split(.,"\\_") %>%
#   lapply(.,function(x) x[1]) %>%
#   unlist() %>%
#   as.numeric()
# 
# all.equal(length(univ_feat), length(mzr))
# 
# tdata_reduced_univ <- data.frame(mz = mzr)
# if(dim(tdata_reduced_univ)[1]>0){
#   result_POS_HMDB_univ <- assignation_pos_HMDB(tdata_reduced_univ)
#   head(result_POS_HMDB_univ)
# } else {
#   message("There is no significant features in Wilcox test")
# }

anotation of all features positive

# # Untargeted assignation
# # Creating mz column
# 
# mzr <- colnames(as.matrix(xdata)) %>%
#   stringr::str_split(.,"\\_") %>%
#   lapply(.,function(x) x[1]) %>%
#   unlist() %>%
#   as.numeric()
# 
# all.equal(dim(xdata)[2], length(mzr))
# 
# tdata_reduced_all_features <- data.frame(mz = mzr)
# result_POS_HMDB_all_features <- assignation_pos_HMDB(tdata_reduced_all_features)
# head(result_POS_HMDB_all_features)

Final thoughts

This vignette shows many of the features of the package, some features have room for improvement, others are not fully described, and the reader will need to browse the documentation. Hopefully it is a good starting point for using the package.

sipss/AlpsLCMS documentation built on May 13, 2021, 6:18 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sipss/AlpsLCMS
Signal processing of LC-MS metabolomics

In sipss/AlpsLCMS: Signal processing of LC-MS metabolomics

NODE 1: Data wrangling

Convert RAW to mzXML

Input

Code to run: list of samples

Code to run (II): creating the dataset object

NODE 2: Append metadata

Input parameters

Code to run

NODE 3: Total ion chromatogram

NODE 4: Filtering

Filter by retention time / m/z

Input

Code to run

NODE 5: Optimization of parameters: IPO

TODO: Añadir autotune o lo que sea para optimizar parametros

Input

NODE 6: Preprocessing with xcms

Peak detection

Retention Time Correction and Peak Correspondence

Correspondence

Alignment and regrouping

Imputation

NODE 8: Feature table

Merging into a Feature Table

NODE 9: Data reduction

Input

Code to run

NODE 10: Data analysis

TODO: adaptar bootstrap de alpsnmr a alpslcms

Bootstrap and permutation

Univariante analysis

Univariate anotation

anotation of all features positive

Final thoughts

R Package Documentation

Browse R Packages

We want your feedback!

sipss/AlpsLCMS Signal processing of LC-MS metabolomics

In sipss/AlpsLCMS: Signal processing of LC-MS metabolomics

NODE 1: Data wrangling

Convert RAW to mzXML

Input

Code to run: list of samples

Code to run (II): creating the dataset object

NODE 2: Append metadata

Input parameters

Code to run

NODE 3: Total ion chromatogram

NODE 4: Filtering

Filter by retention time / m/z

Input

Code to run

NODE 5: Optimization of parameters: IPO

TODO: Añadir autotune o lo que sea para optimizar parametros

Input

NODE 6: Preprocessing with xcms

Peak detection

Retention Time Correction and Peak Correspondence

Correspondence

Alignment and regrouping

Imputation

NODE 8: Feature table

Merging into a Feature Table

NODE 9: Data reduction

Input

Code to run

NODE 10: Data analysis

TODO: adaptar bootstrap de alpsnmr a alpslcms

Bootstrap and permutation

Univariante analysis

Univariate anotation

anotation of all features positive

Final thoughts

R Package Documentation

Browse R Packages

We want your feedback!

sipss/AlpsLCMS
Signal processing of LC-MS metabolomics