Workflow concepts

In a non-target workflow both chromatographic and mass spectral data is automatically processed in order to provide a comprehensive chemical characterization of your samples. While the exact workflow is typically dependent on the type of study, it generally involves of the following steps:

plotGV("
digraph workflow {
  graph [ rankdir = LR, compound = true, style = invis ]
  node [ shape = box,
         fixedsize = true,
         width = 2.8,
         height = 1,
         fontsize = 20,
         fillcolor = darkseagreen1,
         style = filled ]

  subgraph cluster1 {
    'Data pre-treatment' -> 'Find features'
    'Find features' -> 'Group features'
  }

  subgraph cluster2 {
    'Suspect screening' -> 'Group features' [minlen=3, style=dashed]
    'Find features' -> 'Suspect screening' [style=invis]
    'Group features' -> 'Suspect screening' [style=dashed]
  }

  subgraph cluster3 {
    Componentization
  }

  subgraph cluster4 {
    graph [style = dashed ]
    color=blue
    'MS peak lists' 'Formula annotation'  'Compound Annotation'
  }

  'Group features' -> 'Componentization' [ltail=cluster1, lhead=cluster3, style=dashed, dir=both, minlen=3]
  'Group features' -> 'Formula annotation' [ltail=cluster1, lhead=cluster4, style=dashed, minlen=3]

}", height = 250, width = 750)

Note that patRoon supports flexible composition of workflows. In the scheme above you can recognize optional steps by a dashed line. The inclusion of each step is only necessary if a further steps depends on its data. For instance, annotation and componentization do not depend on each other and can therefore be executed in any order or simply be omitted. A brief description of all steps is given below.

During data pre-treatment raw MS data is prepared for further analysis. A common need for this step is to convert the data to an open format so that other tools are able to process it. Other pre-treatment steps may involve re-calibration of m/z data or performing advanced filtering operations.

The next step is to extract features from the data. While different terminologies are used, a feature in patRoon refers to a single chromatographic peak in an extracted ion chromatogram for a single m/z value (within a defined tolerance). Hence, a feature contains both chromatographic data (e.g. retention time and peak height) and mass spectral data (e.g. the accurate m/z). Note that with mass spectrometry multiple m/z values may be detected for a single compound as a result of adduct formation, natural isotopes and/or in-source fragments. Some algorithms may try to combine these different masses in a single feature. However, in patRoon we generally assume this is not the case (and may optionally be done afterwards during the componentization step described below). Features are sometimes simply referred to as 'peaks'.

Features are found per analysis. Hence, in order to compare a feature across analyses, the next step is to group them. This step is essential as it finds equal features even if their retention time or m/z values slightly differ due to analytical variability. The resulting feature groups are crucial input for subsequent workflow steps. Prior to grouping, retention time alignment between analyses may be performed to improve grouping of features, especially when processing multiple analysis batches at once. Outside patRoon feature groups may also be defined as profiles, aligned or grouped features or buckets.

Depending on the study type, suspect screening is then performed to limit the features that should be considered for further processing. As its name suggests, with suspect screening only those features which are suspected to be present are considered for further processing. These suspects are retrieved from a suspect list which contains the m/z and (optionally) retention times for each suspect. Typical suspect lists may be composed from databases with known pollutants or from predicted transformation products. Note that for a 'full' non-target analysis no suspect screening is performed, hence, this step is simply omitted and all features are to be considered.

The feature group data may then be subjected to componentization. A component is defined as a collection of multiple feature groups that are somehow related to each other. Typical examples are features that belong to the same chemical compound (i.e. with different m/z values but equal retention time), such as adducts, isotopes and in-source fragments. Other examples are homologous series and features that display a similar intensity trend across samples. If adducts or isotopes were annotated during componentization then this data may be used to prioritize the feature groups.

The last step in the workflow commonly involves annotation. During this step MS and MS/MS data are collected in so called MS peak lists, which are then used as input for formula and compound annotation. Formula annotation involves automatic calculation of possible formulae for each feature based on its m/z, isotopic pattern and MS/MS fragments, whereas compound annotation (or identification) involves the assignment of actual chemical structures to each feature. Note that during formula and compound annotation typically multiple candidates are assigned to a single feature. To assist interpretation of this data each candidate is therefore ranked on characteristics such as isotopic fit, number of explained MS/MS fragments and metadata from an online database such as number of scientific references or presence in common suspect lists.

To summarize:

The next chapters will discuss how to generate this data and process it. Afterwards, several advanced topics are discussed such as combining positive and negative ionization data, screening for transformation products and other advanced functionality.




rickhelmus/patRoon documentation built on April 25, 2024, 8:15 a.m.