Sets workflows {#setsWorkflow}

In LC-HRMS screening workflows it is typically desired to be able to detect a broad range of chemicals. For this reason, the samples are often measured twice: with positive and negative ionization. Most data processing steps are only suitable for data with the same polarity, for instance, due to the fact that the m/z values in mass spectra are inherently different (e.g. [M+H]+ vs [M-H]-) and MS/MS fragmentation occurs differently. As a result, the screening workflow has to be done twice, which generally requires more time and complicates comparing and interpretation of the complete (positive and negative) dataset.

In patRoon version 2.0 the sets workflow is introduced. This allows you to perform a single non-target screening workflow from different sets of analyses files. Most commonly, each set represents a polarity, hence, there is a positive and negative set. However, more than two sets are supported, and other distinctions between sets are also possible, for instance, samples that were measured with different MS/MS techniques. Another important advantage of the sets workflow is that MS/MS data from different sets can be combined to provide more comprehensive annotations of features. The most important limitation is that (currently) the chromatographic method that was used when analyzing the samples from each set needs to be equal, since retention times are used to group features among the sets.

Performing a sets workflow usually only requires small modifications compared to a 'regular' patRoon workflow. This chapter outlines how to perform such workflows and how to use its unique functionality for data processing. It is assumed that the reader is already familiar with performing 'regular' workflows, which were discussed in the previous chapters.

Initiating a sets workflow

A sets workflow is not much different than a 'regular' (or non-sets) workflow. For instance, consider the following workflow:

anaInfo <- patRoonData::exampleAnalysisInfo("positive")
fList <- findFeatures(anaInfo, "openms")
fGroups <- groupFeatures(fList, "openms")
fGroups <- filter(fGroups, absMinIntensity = 10000, relMinReplicateAbundance = 1, maxReplicateIntRSD = 0.75,
                  blankThreshold = 5, removeBlanks = TRUE)

mslists <- generateMSPeakLists(fGroups, "mzr")
formulas <- generateFormulas(fGroups, mslists, "genform", adduct = "[M+H]+")
compounds <- generateCompounds(fGroups, mslists, "metfrag", adduct = "[M+H]+")

report(fGroups, MSPeakLists = mslists, formulas = formulas, compounds = compounds)

This example uses the example data from [patRoonData] to obtain a feature group dataset, which is cleaned-up afterwards. Then, feature groups are annotated and all the results are reported.

Converting this to a sets workflow:

anaInfoPos <- patRoonData::exampleAnalysisInfo("positive")
anaInfoNeg <- patRoonData::exampleAnalysisInfo("negative")
fListPos <- findFeatures(anaInfoPos, "openms")
fListNeg <- findFeatures(anaInfoNeg, "openms")
fList <- makeSet(fListPos, fListNeg, adducts = c("[M+H]+", "[M-H]-"))

fGroups <- groupFeatures(fList, "openms")
fGroups <- filter(fGroups, absMinIntensity = 10000, relMinReplicateAbundance = 1, maxReplicateIntRSD = 0.75,
                  blankThreshold = 5, removeBlanks = TRUE)

mslists <- generateMSPeakLists(fGroups, "mzr")
formulas <- generateFormulas(fGroups, mslists, "genform")
compounds <- generateCompounds(fGroups, mslists, "metfrag")

report(fGroups, MSPeakLists = mslists, formulas = formulas, compounds = compounds)

This workflow will do all the steps for positive and negative data.

plotGV("
digraph Workflow {
  graph [ rankdir = TB, compound = true ]
  node [ shape = box,
         fixedsize = true,
         width = 2.2,
         height = 0.6,
         fontsize = 16,
         fillcolor = darkseagreen1,
         style = filled ]

    'Pre-treatment (+)' -> 'Find features (+)' -> 'makeSet'
    'Pre-treatment (-)' -> 'Find features (-)' -> 'makeSet'
    'makeSet' -> 'Group features' -> 'Annotation, ...'
}", height = 300, width = 250)

Only a few modifications were necessary:

NOTE The analysis names for the analysis information must be unique for each row, even among sets. Furthermore, replicate groups should not contain analyses from different sets.

The key principle to make sets workflows work is performed by makeSet. This method function takes different features objects (or featureGroups, discussed later) to combine the feature data across sets. During this step features are neutralized: the feature m/z data is converted to neutral feature masses. This step ensures that when features are grouped with groupFeatures, its algorithms are able to find the same feature among different sets, even when different MS ionization modes were used during acquisition. However, please note that (currently) no additional chromatographic alignment steps between sets are performed. For this reason, the chromatographic methodology that is used to acquire the data must be the same for all sets.

The feature neutralization step relies on adduct data. In the example above, it is simply assumed that all features measured with positive mode are protonated (M+H) species, and all negative features are deprotonated (M-H). It is also possible to use adduct annotations for neutralization; this is discussed later.

NOTE The newProject tool can be used to easily generate a sets workflow. Simply select "both" for the Ionization option.

Generating sets workflow data

As was shown in the previous section, the generation of workflow data with a sets workflow largely follows that as what was discussed in the previous chapters. The same generator functions are used:

Workflow step | Function | Output S4 class
--------------------- | ------------------------------------------------- | ---------------------------- Grouping features | groupFeatures() | featureGroupsSet Suspect screening | screenSuspects() | featureGroupsScreeningSet MS peak lists | generateMSPeakLists() | MSPeakListsSet Formula annotation | generateFormulas() | formulasSet Compound annotation | generateCompounds() | compoundsSet Componentization | generateComponents() | algorithm dependent

(the data pre-treatment and feature finding steps have been omitted as they are not specific to sets workflows).

While the same function generics are used to generate data, the class of the output objects differ (e.g. formulasSet instead of formulas). However, since all these classes inherit from their non-sets workflow counterparts, using the workflow data in a sets workflow is nearly identical to what was discussed in the previous chapters (further discussed in the next section).

As discussed before, an important step is the neutralization of features. Other workflow steps also have internal mechanics to deal with data from different sets:

Workflow step | Handling of set data --------------------------- | ------------------------------------------------------ Finding/Grouping features | Neutralization of m/z values Suspect screening | Merging results from screening performed for each set Componentization | Algorithm dependent (discussed below) MS peak lists | MS data is obtained and stored per set. The final peak lists are combined (not averaged) Formula/Compound annotation | Annotation is performed for each set separately and used to generate a final consensus

In most cases the algorithms of the workflow steps are first performed for each set, and this data is then merged. To illustrate the importance of this, consider these examples

In all cases data is used that is highly dependent on the MS method (eg polarity) that was used to acquire the sample data. Nevertheless, all the steps needed to obtain and combine set data are performed automatically in the background, and are therefore largely invisible.

NOTE Because feature groups in sets workflows always have adduct annotations, it is never required to specify the adduct or ionization mode when generating annotations, components or do suspect screening (i.e. the adduct/ionization arguments should not be specified).

Componentization

When the componentization algorithms related to adduct/isotope annotations (e.g. [CAMERA], [RAMClustR] and [cliqueMS]) and [nontarget] are used, then componentization occurs per set and the final object (a componentsSet or componentsNTSet) contains all the components together. Since these algorithms are highly dependent upon MS data polarity, no attempt is made to merge components from different sets.

The other componentization algorithms work on the complete data. For more details, see the reference manual (?generateComponents).

Formula and compound annotation

For formula and compound annotation, the data generated for each set is combined to generate a set consensus. The annotation tables are merged, scores are averaged and candidates are re-ranked. More details can be found in the reference manual (e.g. ?generateCompounds). In addition, it possible to only keep candidates that exist in a minimum number of sets. For this, the setThreshold and setThresholdAnn argument can be used:

# candidate must be present in all sets
formulas <- generateFormulas(fGroups, mslists, "genform", setThreshold = 1)
# candidate must be present in all sets with annotation data
compounds <- generateCompounds(fGroups, mslists, "metfrag", setThresholdAnn = 1)

In the first example, a formula candidate for a feature group is only kept if it was found for all of the sets. In the second example, a compound candidate is only kept if it was present in all of the sets with annotation data available. The following examples of a common positive/negative sets workflow illustrate the differences:

Candidate | annotations | candidate present | setThreshold=1 | setThresholdAnn=1 --------- | ----------- | ----------------- | ---------------- | --------------------- #1 | +, - | +, - | Keep | Keep #2 | +, - | + | Remove | Remove #3 | + | + | Remove | Keep

For more information refer to the reference manual (e.g. ?generateCompounds).

Selecting adducts to improve grouping {#setsAdducts}

The selectIons() and adduct() functions discussed before can also improve sets workflows. This is because the adduct annotations can be used to improve feature neutralization, which in turn will improve grouping features between positive and negative ionization data. Once adduct annotations are set the features will be re-neutralized and re-grouped.

A typical workflow with selectIons looks like this:

plotGV("
digraph Workflow {
  graph [ rankdir = TB, compound = true ]
  node [ shape = box,
         fixedsize = true,
         width = 2.2,
         height = 0.6,
         fontsize = 16,
         fillcolor = darkseagreen1,
         style = filled ]

    'Pre-treatment (+)' -> 'Find features (+)' -> 'makeSet'
    'Pre-treatment (-)' -> 'Find features (-)' -> 'makeSet'
    'makeSet' -> 'Group features' -> Componentization -> selectIons -> 'Annotation, ...'
}", height = 350, width = 250)
# as before ...
anaInfoPos <- patRoonData::exampleAnalysisInfo("positive")
anaInfoNeg <- patRoonData::exampleAnalysisInfo("negative")
fListPos <- findFeatures(anaInfoPos, "openms")
fListNeg <- findFeatures(anaInfoNeg, "openms")

fGroupsPos <- groupFeatures(fListPos, "openms")
fGroupsNeg <- groupFeatures(fListNeg, "openms")
fList <- makeSet(fListPos, fListNeg, adducts = c("[M+H]+", "[M-H]-"))

fGroups <- groupFeatures(fList, "openms")
fGroups <- filter(fGroups, absMinIntensity = 10000, relMinReplicateAbundance = 1, maxReplicateIntRSD = 0.75,
                  blankThreshold = 5, removeBlanks = TRUE)

components <- generateComponents(fGroups, "openms")
fGroups <- selectIons(fGroups, components, c("[M+H]+", "[M-H]-"))

# do rest of the workflow...

The first part of the workflow is exactly the same as was introduced in the beginning of this chapter. Furthermore, note that for sets workflows, selectIons needs a preferential adduct for each set.

The adducts function can also be used to obtain and modify adduct annotations. For sets workflows, these functions operate per set:

adducts(fGroups, set = "positive")[1:5]
adducts(fGroups, set = "positive")[4] <- "[M+K]+"

If you want to modify annotations for multiple sets, it is best to delay the re-gouping step:

adducts(fGroups, set = "positive", reGroup = FALSE)[4] <- "[M+K]+"
adducts(fGroups, set = "negative", reGroup = TRUE)[10] <- "[M-H2O]-"

Setting reGroup=FALSE will not perform any re-neutralization and re-grouping, which preserves feature group names and safes processing time. However, it is crucial that the re-grouping step is eventually performed at the end.

Processing data

All data objects that are generated during a sets workflow inherit from the classes from a 'regular' workflow. This means that, with some minor exceptions, all of the data processing functionality discussed in the previous chapter (e.g. subsetting, inspection, filtering, plotting, reporting) is also applicable to a sets workflow. For instance, the as.data.table() method can be used for general inspection:

fGroupsO <- fGroups
fGroups <- fGroupsSets
mslistsO <- mslists
mslists <- mslistsSets
compoundsO <- compounds
compounds <- compoundsSets
as.data.table(compounds)[1:5, c("group", "score-positive", "score-negative", "compoundName", "set")]

In addition, some the data processing functionality contains additional functionality for a sets workflow:

# only keep feature groups that have positive data
fGroupsPos <- fGroups[, sets = "positive"]
# only keep feature groups that have feature data for all sets
fGroupsF <- filter(fGroups, relMinSets = 1)

# only keep feature groups with features present in both polarities
fGroupsPosNeg <- overlap(fGroups, which = c("positive", "negative"), sets = TRUE)
# only keep feature groups with features that are present only in positive mode
fGroupsOnlyPos <- unique(fGroups, which = "positive", sets = TRUE)

And plotting:

plotVenn(fGroups, sets = TRUE) # compare positive/negative features
plotSpectrum(compounds, index = 1, groupName = "M198_R317_273", MSPeakLists = mslists,
             plotStruct = TRUE)
fGroups <- fGroupsO
mslists <- mslistsO
compounds <- compoundsO

The reference manual for the workflow objects contains specific notes applicable to sets workflows (?featureGroups, ?compounds etc).

Advanced

Initiating a sets workflow from feature groups

The makeSet function can also be used to initiate a sets workflow from feature groups:

# as before ...
anaInfoPos <- patRoonData::exampleAnalysisInfo("positive")
anaInfoNeg <- patRoonData::exampleAnalysisInfo("negative")
fListPos <- findFeatures(anaInfoPos, "openms")
fListNeg <- findFeatures(anaInfoNeg, "openms")

fGroupsPos <- groupFeatures(fListPos, "openms")
fGroupsNeg <- groupFeatures(fListNeg, "openms")

fGroups <- makeSet(fGroupsPos, fGroupsNeg, groupAlgo = "openms",
                   adducts = c("[M+H]+", "[M-H]-"))

# do rest of the workflow...

In this case makeSet takes the positive and negative features, neutralizes them and creates new feature groups by grouping the original set specific groups (with the algorithm specified by groupAlgo).

While this option involves some extra steps, an advantage is that allows processing the feature data before they are combined, e.g.:

fGroupsPos <- groupFeatures(fListPos, "openms")
fGroupsNeg <- groupFeatures(fListNeg, "openms")

# apply intensity theshold filters. Lower threshold for negative.
fGroupsPos <- filter(fGroupsPos, absMinIntensity = 1E4)
fGroupsNeg <- filter(fGroupsNeg, absMinIntensity = 1E3)

fGroups <- makeSet(fGroupsPos, fGroupsNeg, groupAlgo = "openms",
                   adducts = c("[M+H]+", "[M-H]-"))

Visually, this workflow looks like this:

plotGV("
digraph Workflow {
  graph [ rankdir = TB, compound = true ]
  node [ shape = box,
         fixedsize = true,
         width = 2.2,
         height = 0.6,
         fontsize = 16,
         fillcolor = darkseagreen1,
         style = filled ]

    'Find features (+)' -> 'Group features (+)' -> 'filter (+)' -> 'makeSet'
    'Find features (-)' -> 'Group features (-)' -> 'filter (-)' -> 'makeSet'
    'makeSet' -> '...'
}", height = 300, width = 250)

Of course, any other processing steps on the feature groups data such as subsetting and visually checking features are also possible before the sets workflow is initiated. Furthermore, it is also possible to perform adduct annotations prior to grouping, which is an alternative way to improve neutralization to what was discussed before.

Inspecting and converting set objects

The following generic functions may be used to inspect or convert data from sets workflows:

Generic | Purpose | Notes ------------ | ------------------------------------- | --------------------------------------- sets | Return the names of the sets in this object. setObjects | Obtain the raw data objects that were used to construct this object. | Not available for features and feature groups. unset | Converts this object to a regular workflow object. | The set argument must be given to specify which of the set data is to be converted. This function will restore the original m/z values of features.

These methods are heavily used internally, but rarely needed otherwise. More details can be found in the reference manual.



rickhelmus/patRoon documentation built on April 25, 2024, 8:15 a.m.