Purpose of the w4mclstrpeakpics package

w4mclstrpeakpics::pool_peak_assessment produces a figure comprising four graphs to assess the similarities and differences among peaks in a cluster of samples using XCMS-preprocessed data files as input.
The figures are described in sections below.

The purpose of the w4mclstrpeakpics R package is to provide the computational back-end of a Galaxy tool for inclusion in Workflow4Metabolomics (W4M).

Galaxy tools are file-oriented; because of this, the w4mclstrpeakpics::pool_peak_assessment method reads from and writes to files. General-purpose R packages usually use data structures in memory for their input and output, which may mean that this R package is not generally useful outside of the context of Galaxy.

How to use the pool_peak_assessment function

A Galaxy tool wrapper invokes pool_peak_assessment. For exploratory or debugging purposes, the package may be installed loaded into R and help may then be obtained with the following command:

?w4mclstrpeakpics::pool_peak_assessment

W4M uses the XCMS and CAMERA packages to preprocess GC-MS or LC-MS data, producing three files (which are documented in detail on the Workflow4Metabolomics (W4M) web site).
In summary:

  1. sampleMetadata.tsv: a tab-separated file with metadata for the samples, one line per sample

  2. One column of this file indicates the class of the sample.

  3. It is the class that is used by this function to determine whether to include the sample in, or exclude the sample from, further analysis.

  4. variableMetadata.tsv: a tab-separated file with metadata for the features detected, one line per feature

  5. A feature is a location in the two dimensional space defined by the GC-MS or LC-MS data set, which corresponds to a compound or a group of compounds.

  6. One dimension is the mass-to-charge ratio, m/z.
  7. The other dimension is the retention time, i.e., how long until the solvent gradient eluted the compound(s) from the column.

  8. dataMatrix.tsv: a tab separated file with the MS intensities for each sample for each feature:

  9. There is one column per sample.

  10. There is one row per feature.
  11. If a feature is missing for a sample, the intensity value is NA.
  12. For numerical reasons, intenisities may be negative, but this has no meaning in the real world.

The pool_peak_assessment function reads these files and produces four graphs. Inputs arguments are as follows:

  1. sample_selector_column_name - string input: column of W4M/XCMS sampleMetadata holding selector string values (default: "sampleType").

  2. sample_selector_value - string input: value within selector column to identify samples for analysis (default: "pool").

  3. sample_metadata_path - string input: path to W4M/XCMS sampleMetadata tab-separated values file.

  4. variable_metadata_path - string input: path to W4M/XCMS variableMetadata tab-separated values file.

  5. data_matrix_path - string input: path to W4M/XCMS dataMatrix tab-separated values file.

  6. output_pdf - string output: path to write assessment figure PDF.

  7. output_tsv - string output: path to write assessment summary tab-separated values file.

  8. output_rdata - string output: (optional) path to write RData containing all processing and plotting intermediates.

The Feature Number and Likelihood graph

The upper left graph in the output figure shows the following:

Ideally, there would be an upward trend from left to right; if not, XCMS peak-picking parameters may need to be adjusted to suppress low-intensity "noise" peaks or to address peak-splitting.

The Peak Intensity graph

The lower left graph in the output figure presents the data in the upper figure without summarization, so that "the eye" can do the interpretation. It shows the following:

Ideally, there would be an upward trend from left to right, with more points on the right; if not, XCMS peak-picking parameters may need to be adjusted to suppress low-intensity "noise" peaks or to address peak-splitting.

The Symbol area/intensity reflect ion intensity graph

The upper right graph in the output figure shows the following:

Consequently, if the graph has a lot of large, dark, blueish symbols for repeated runs of a pooled sample, there is strong evidence that the XCMS peak-picking parameters need adjustment to make peak-picking more consistent.

The Symbol size/shape reflects prevalence graph

By contrast with the area/intensity graph, lower right graph's primary purpose is to communicate prevalence of a feature among the samples. It`shows the following:

Consequently, if the graph has a lot of small, vivid symbols for repeated runs of a pooled sample, there is strong evidence that the XCMS peak-picking parameters need adjustment to make peak-picking more consistent.



HegemanLab/w4mclstrpeakpics documentation built on May 23, 2019, 10:32 p.m.