Simulated Metabolomics Data

Share:

Description

This is a simulated dataset to show the format of the metabolomics data; patterns of missing data are generated roughly from a real metabolomics experiment. Rows represent metabolites and columns represent samples. The file contains 100 metabolites (rows) and 505 samples (480 biological sample columns and 25 pooled plasma columns) sorted by injection order. There are 20 biological samples between pooled plasma runs. Pooled plasma columns have prefix ‘PPP’ and biological samples are simple integers with no prefix.

Usage

1

Format

The first row (Date) contains the date of processing. The second row (Inject) contains the injection number and is ordered from 1 to 505. The third row contains the column headers:

Metab is the metabolite ID.
Meth is the type of metabolite.
HMDB is the HMDB ID of the metabolite, if it exists.
m/z is the mass-to-charge ratio of the metabolite.
rt is the retention time.
Com contains any comments.
ProcID is the processing ID of the metabolite.

The remaining columns are either pooled plasma samples (prefix: ‘PPP’) or biological samples (prefix: No prefix). The basic structure of the csv file is as follows:

Date 415 415 .. 415 415 415 ..
Inject 1 2 .. 21 22 23 ..
Metab Meth HMDB m/z rt Com ProcID PPP1 1 .. 20 PPP2 21 ..
M1 Lipid H1 304 8.7 1 6.7 6.7 .. 5.0 6.7 4.6 ..
M2 Lipid H2 309 7.6 2 1.0 1.1 .. 1.1 1.0 1.2 ..
.. .. .. .. .. .. .. .. .. .. .. .. .. ..
M100 Lipid H100 249 6.2 100 2.4 1.9 .. 2.2 2.4 1.6 ..

See Also

See read.met for example of reading this csv file for use.
See MetProc-package for examples of running the full process.