Transformation product screening {#TPs}

This chapter describes the various functionality for screening of transformation products (TPs), which are introduced since patRoon 2.0. Screening for TPs, i.e. chemicals that are formed from a parent chemical by e.g. chemical or biological processes, has broad applications. For this reason, the TP screening related functionality is designed to be flexible, thus allowing one to use a workflow that is best suited for a particular study.

Regardless, the TP screening workflow in patRoon can be roughly summarized as follows:

plotGV("
digraph Workflow {
  graph [ rankdir = LR ]
  node [ shape = box,
         fixedsize = true,
         width = 2.2,
         height = 1,
         fontsize = 18,
         fillcolor = darkseagreen1,
         style = filled ]

    'Parent screening' -> 'Obtaining TPs' -> 'TP screening' -> 'Linking parent/TPs'
}", height = 90, width = 500)

The next sections will outline more details on these steps are performed and configured. The last section in this chapter outlines several example workflows.

NOTE The newProject tool can be used to easily generate a workflow with transformation product screening.

Obtaining transformation product data {#genTPs}

The generateTPs function is used to obtain TPs for a particular set of parents. Like other workflow generator functions (findFeatures, generateCompounds), several algorithms are available that do the actual work.

Algorithm | Usage | Remarks ---------------- | ------------------------------------------------- | --------------------------------- [BioTransformer] | generateTPs(algorithm = "biotransformer", ...) | Predicts TPs with full structural information [CTS] | generateTPs(algorithm = "cts", ...) | Predicts TPs with full structural information Library | generateTPs(algorithm = "library", ...) | Obtains transformation products from a library ([PubChem transformations][PubChemLiteTR] or custom) Formula library | generateTPs(algorithm = "library_formula", ...) | Obtains transformation products from a library (only formula data) Metabolic logic | generateTPs(algorithm = "logic", ...) | Uses pre-defined logic to predict TPs based on common elemental differences (e.g. hydroxylation, demethylation). Based on @Scholle2015.

The output of these algorithms can be distinguished in three categories:

  1. Structural TPs (biotransformer, cts and library) come with full structural information for the TPs (e.g. formula, SMILES, predicted Log P). As such, the corresponding algorithms also require the full chemical structure of the parent compound.
  2. Formula TPs (library_formula) are similar to structural TPs, but only involve formula and no other structural information.
  3. Calculated TPs (logic) are based solely on m/z differences and only require the feature masses.

Algorithms that fall into the first category are typically used when parents are known in advance, for instance, from a target or suspect screening. This is also true for the second category, however, here only formula data is used, which is useful when the complete structure of parents and/or TPs are not known. Calculated TPs allow TP prediction for all features, even when nothing is known about their structure. This is most suitable for full non-target analysis, however, extra care must be taken to rule out false positives. Finally, the logic used to calculate TPs can also be used to automatically to generate a library suitable for the library_formula algorithm, which allows a hybrid approach of the second and third categories.

An overview of common arguments for TP generation is listed below.

Argument | Algorithm(s) | Remarks ----------------------------- | ----------------- | -------------------------------------------------------- parents | biotransformer, cts, library | The input parents. See section below. fGroups | logic | The input feature groups to calculate TPs for. type | biotransformer | The prediction type: "env", "ecbased", "cyp450", "phaseII", "hgut", "superbio", "allHuman". See [BioTransformer] for more details. transLibrary | cts | The transformation library that should be used: "hydrolysis", "abiotic_reduction", "photolysis_unranked", "photolysis_ranked", "mammalian_metabolism", "combined_abioticreduction_hydrolysis", "combined_photolysis_abiotic_hydrolysis". See [cts] for more details. TPLibrary/transformations | library/logic | Custom TP library/transformation rules. generation | biotransformer, cts, library | The amount of transformation generations to predict. adduct | logic | The assumed adduct of the parents (e.g. "[M+H]+"). Not needed when adduct annotations are available. calcSims | biotransformer, cts, library | If TRUE then structural similarities between the parent and TPs is calculated, which can be useful for post-processing (discussed later).

Parent input

The input parent structures to generate structural/formula TPs (biotransformer, cts, library and library_formula algorithms) must be specified as one of the following:

In the former two cases the parent information is taken from the suspect list or from the hits in a suspect screening worklow, respectively. The last case is more suitable for when the parents are not completely known. In this case, the candidate structures from a compound annotation are used as input to obtain TPs. Since all the candidates are used, it is highly recommend to filter the object in advance, for instance, with the topMost filter. For library and library_formula, the parent input is optional: if no parents are specified then TP data for all parents in the database is used.

For the logic algorithm TPs are predicted directly for feature groups. Since this algorithm can only perform very basic validity checks, it is strongly recommended to first prioritize the feature group data.

Some typical examples:

# predict environmental TPs with BioTransformer for all parents in a suspect list
TPsBT <- generateTPs("biotransformer", parents = patRoonData::suspectsPos,
                     type = "env")
# obtain all TPs from the default library
TPsLib <- generateTPs("library")
# get TPs for the parents from a suspect screening
TPsLib <- generateTPs("library", parents = fGroupsScr)
# calculate TPs for all feature groups
TPsLogic <- generateTPs("logic", fGroups, adduct = "[M+H]+")

Processing data

Similar to other workflow data, several generic functions are available to inspect the data:

Generic | Remarks
-----------------------------------|---------------------------------------------------------------------- length() | Returns the total number of transformation products names() | Returns the names of the parents parents() | Returns a table with information about the parents products() | Returns a list with for each parent a table with TPs as.data.table(), as.data.frame | Convert all the object information into a data.table/data.frame "[[" / "$" operators | Extract TP information for a specified parent

Some examples:

# just show a few columns in this example, there are many more!
# note: the double dot syntax (..cols) is necessary since the data is stored as data.tables
cols <- c("name", "formula", "InChIKey")
parents(TPs)[1:5, ..cols]
TPs[["DEET"]][, ..cols]
TPs[[2]][, ..cols]
as.data.table(TPs)[1:5, 1:3]

In addition, the following generic functions are available to modify or convert the object data:

Generic | Classes | Remarks ------------------- | -------------------------- | -------------------------------------------------------- "[" operator | All | Subset this object on given parents filter | All | Filters this object convertToSuspects | All | Generates a suspect list of all TPs (and optionally parents) that is suitable for screenSuspects

TPs2 <- TPs[1:10] # only keep results for first ten parents

# only keep TPs with likely/probably likelihood (specific property for CTS algorithm)
TPsF <- filter(TPs, properties = list(likelihood = c("LIKELY", "PROBABLE")))

# do a suspect screening for all TPs and their parents
suspects <- convertToSuspects(TPs, includeParents = TRUE)
fGroupsScr <- screenSuspects(fGroups, suspects, onlyHits = TRUE)

The convertToSuspects function is always part of a workflow, and is discussed further in the next section.

Structural TPs specifics

For structural TPs several additional generic functions are available:

Generic | Remarks ------------------- | -------------------------------------------------------- filter | Filters this object (additional functionality for structural TPs) convertToMFDB | Generates a [MetFrag] database for all TPs (and optionally parents) plotGraph | Generates an interactive plot to explore transformation hierarchies plotVenn, plotUpSet | Compare results between different algorithms with Venn/UpSet diagrams

The convertToMFDB function is especially handy with predicted TPs, as it allows generating a compound database for TPs that may not be available in commonly used databases. This is further demonstrated in the first example.

# remove transformation products that are isomers to their parent or sibling TPs
# may simplify data as these are often difficult to identify
TPsF <- filter(TPs, removeParentIsomers = TRUE, removeTPIsomers = TRUE)

# remove duplicate transformation products from each parent
# these can occur if different pathways yield the same TPs
TPsF <- filter(TPs, removeDuplicates = TRUE)

# only keep TPs that have a structural similarity to their parent of >= 0.5
# (needs calcSims=TRUE when executing generateTPs())
TPsF <- filter(TPs, minSimilarity = 0.5)

# use the TP data for a specialized MetFrag database
convertToMFDB(TPs, "TP-database.csv", includeParents = FALSE)
compoundsTPs <- generateCompounds(fGroups, mslists, "metfrag", database = "csv",
                                  extraOpts = list(LocalDatabasePath = "TP-database.csv"))
plotVenn(TPsLib, TPsBT, labels = c("lib", "BT"))
plotGraph(TPsBT, which = "Triazophos") # hierarchy for Triazophos parent

Finally, results from different algorithms can be combined with the consensus generic function. This is further discussed in algorithm consensus.

(Custom) Libraries and transformations {#TPsCustom}

By default the library and logic algorithms use data that is installed with patRoon (based on [PubChem transformations][PubChemLiteTR] and @Scholle2015, respectively). However, it is also possible to use custom data. For the library_formula no default library is provided, however, these can easily be generated as is discussed at the end of the section.

To use a custom TP structure library a simple data.frame is needed with the names, SMILES and optionally log P values for the parents and TPs. The log P values are used for prediction of the retention time direction of a TP compared to its parent, as is discussed further in the next section. The following small library has two TPs for benzotriazole and one for DEET:

myTPLib <- data.frame(parent_name = c("1H-Benzotriazole", "1H-Benzotriazole", "DEET"),
                      parent_SMILES = c("C1=CC2=NNN=C2C=C1", "C1=CC2=NNN=C2C=C1", "CCN(CC)C(=O)C1=CC=CC(=C1)C"),
                      TP_name = c("1-Methylbenzotriazole", "1-Hydroxybenzotriazole", "N-ethyl-m-toluamide"),
                      TP_SMILES = c("CN1C2=CC=CC=C2N=N1", "C1=CC=C2C(=C1)N=NN2O", "CCNC(=O)C1=CC=CC(=C1)C"))
myTPLib

To use this library, simply pass it to the TPLibrary argument:

TPs <- generateTPs("library", TPLibrary = myTPLib)

For library_formula the library follows the same format. However, here the formula should be specified instead of the SMILES with the parent_formula and TP_formula columns (although it is still allowed to only specify SMILES, as in this case the formulae are automatically calculated).

For the logic algorithm a table with custom transformation rules can be specified for TP calculations:

myTrans <- data.frame(transformation = c("hydroxylation", "demethylation"),
                      add = c("O", ""),
                      sub = c("", "CH2"),
                      retDir = c(-1, -1))
myTrans

The add and sub columns are used to denote the elements that are added or subtracted by the reaction. These are used to calculate mass differences between parents and TPs. The retDir column is used to indicate the retention time direction of the parent compared to the TP: -1 (elutes before parent), 1 (elutes after parent) or 0 (similar or unknown). The next section describes how this data can be used to filter TPs. The custom rules can be used by passing them to the transformations argument:

TPs <- generateTPs("logic", fGroups, adduct = "[M+H]+", transformations = myTrans)

The genFormulaTPLibrary() utility function can be used to automatically generate TP libraries suitable for the library_formula algorithm. The transformation rules to calculate TPs are specified in the same format as used by the logic algorithm.

myTPFormLib <- genFormulaTPLibrary(parents = patRoonData::suspectsPos, transformations = myTrans)
# also calculate second generation TPs (TPs of TPs)
myTPFormLib2 <- genFormulaTPLibrary(parents = patRoonData::suspectsPos, transformations = myTrans,
                                    generations = 2)

# Use library
TPs <- generateTPs("library_formula", TPLibrary = myTPFormLib)

Compared to the logic algorithm, the library_formula algorithm is more (and only) suitable for suspect/target screening workflows, allows multiple transformation generations and allows better customization through manually adding/removing TPs from the library prior to passing it to generateTPs().

Linking parent and transformation product features

This section discusses one of the most important steps in a TP screening workflow, which is to link feature groups of parents with those of candidate transformation products. During this step, components are made, where each component consist of one or more feature groups of detected TPs for a particular parent. Note that componentization was already introduced before, but for very different algorithms. However, the data format for TP componentization is highly similar. After componentization, several filters are available to clean and prioritize the data. These can even allow workflows without obtaining potential TPs in advance, which is discussed in the last subsection.

Componentization

Like other algorithms, the generateComponents generic function is used to generate TP components, by setting the algorithm parameter to "tp".

The following arguments are of importance:

Argument | Remarks --------------- | -------------------------------------------------------------- fGroups | The input feature groups for the parents fGroupsTPs | The input feature groups for the TPs ignoreParents | Set to TRUE to ignore feature groups in fGroupsTPs that also occur in fGroups TPs | The input transformation products, ie as generated by generateTPs() MSPeakLists, formulas, compounds | Annotation objects used for similarity calculation between the parent and its TPs minRTDiff | The minimum retention time difference (seconds) of a TP for it to be considered to elute differently than its parent.

Feature group input {#TPsFGroups}

The fGroups, fGroupsTPs and ignoreParents arguments are used by the componentization algorithm to identify which feature groups can be considered as parents and which as TPs. Three scenarios are possible:

  1. fGroups=fGroupsTPs and ignoreParents=FALSE: in this case no distinction is made, and all feature groups are considered a parent or TP (default if fGroupsTPs is not specified).
  2. fGroups and fGroupsTPs contain different subsets of the same featureGroups object and ignoreParents=FALSE: only the feature groups in fGroups/fGroupsTPs are considered as parents/TPs.
  3. As above, but with ignoreParents=TRUE: the same distinction is made as above, but any feature groups in fGroupsTPs are ignored if also present in fGroups.

The first scenario is often used if it is unknown which feature groups may be parents or which are TPs. Furthermore, this scenario may also be used if the dataset is sufficiently simple, for instance, because a suspect screening with the results from convertToSuspects (discussed in the previous section) would reliably discriminate between parents and TPs. A workflow with the first scenario is demonstrated in the second example.

In all other cases it is recommended to use either the second or third scenario, since making a prior distinction between parent and TP feature groups greatly simplifies the dataset and reduces false positives. A relative simple example where this can be used is when there are two sample groups: before and after treatment.

componTP <- generateComponents(algorithm = "tp",
                               fGroups = fGroups[rGroups = "before"],
                               fGroupsTPs = fGroups[rGroups = "after"])

In this example, only those feature groups present in the "before" replicate group are considered as parents, and those in "after" may be considered as a TP. Since it is likely that there will be some overlap in feature groups between both sample groups, the ignoreParents flag can be used to not consider any of the overlap for TP assignments:

componTP <- generateComponents(algorithm = "tp",
                               fGroups = fGroups[rGroups = "before"],
                               fGroupsTPs = fGroups[rGroups = "after"],
                               ignoreParents = TRUE)

More sophisticates ways are of course possible to provide an upfront distinction between parent/TP feature groups. In the fourth example a workflow is demonstrated where fold changes are used.

NOTE The feature groups specified for fGroups/fGroupsTPs must always originate from the same featureGroups object.

For the library and biotransformer algorithms it is mandatory that a suspect screening of parents and TPs is performed prior to componentization. This is necessary for the componentization algorithm to map the feature groups that belong to a particular parent or TP. To do so, the convertToSuspects function is used to prepare the suspect list:

# set includeParents to TRUE since both the parents and TPs are needed
suspects <- convertToSuspects(TPs, includeParents = TRUE)
fGroupsScr <- screenSuspects(fGroups, suspects, onlyHits = TRUE)

# do the componentization
# a similar distinction between fGroups/fGroupsScr as discussed above can of course also be done
componTP <- generateComponents(fGroups = fGroupsScr, ...)

If a parent screening was already performed in advance, for instance when the input parents to generateTPs are screening results, the screening results for parents and TPs can also be combined. The second example demonstrates this.

Note that in the case a parent suspect is matched to multiple feature groups, a component is made for each match. Similarly, if multiple feature groups match to a TP suspect, all of them will be incorporated in the component.

When TPs were generated with the logic algorithm a suspect screening must also be carried out in advance. However, in this case it is not necessary to include the parents (since each parent equals a feature group no mapping is necessary). The onlyHits variable to screenSuspects must not be set in order to keep the parents.

# only screen for TPs
suspects <- convertToSuspects(TPs, includeParents = FALSE)
# but keep all other feature groups as these may be parents
fGroupsScr <- screenSuspects(fGroups, suspects, onlyHits = FALSE)

# do the componentization...

Annotation similarity calculation

If additional annotation data for parents and TPs is given to the componentization algorithm, it will be used to calculate various similarity properties. Often, the chemical structure for a transformation product is similar to that of its parent. Hence, there is a good chance that a parent and its TPs also share similar MS/MS data.

Firstly, if MS peak lists are provided, then the spectrum similarity is calculated between each parent and its potential TP candidates. This is performed with all the three different alignment shifts (see the spectrum similarity section for more details).

In case formulas and/or compounds objects are specified, then a parent/TP comparison is made by counting the number of fragments and neutral losses that they share (by using the formula annotations). This property is mainly used for non-target workflows where the identity for a parent and TP is not yet well established. For this reason, fragments and neutral losses reported for all candidates for the parent/TP feature group are considered. Hence, it is highly recommend to pre-treat the annotation objects, for instance, with the topMost filter. If both formulas and compounds are given the results are pooled. Note that each unique fragment/neutral loss is only counted once, thus multiple formula/compound candidates with the same annotations will not skew the results.

Processing data {#TPsProc}

The output of TP componentization is an object of the componentsTPs class. This derives from the 'regular' components class, therefore, all the data processing functionality described before (extraction, subsetting, filtering etc) are also valid for TP components.

Several additional filters are available to prioritize the data:

Filter | Remarks ------------- | ----------------------------------- retDirMatch | If TRUE only keep TPs with an expected chromatographic retention direction compared to the parent. minSpecSim, minSpecPrec, minSpecSimBoth | The minimum spectrum similarity between the parent and TP. Calculated with no, "precursor" and "both" alignment shifting (see spectrum similarity). minFragMatches, minNLMatches | Minimum number of formula fragment/neutral loss matches between parent and TP (discussed in previous section). formulas | A formulas object used to further verify candidate TPs that were generated by the logic algorithm.

The retDirMatch filter compares the expected and observed retention time direction of a TP in order to decide if it should be kept. The direction is a value of either -1 (TP elutes before parent), +1 (TP elutes after parent) or 0 (TP elutes very close to the parent or its direction is unknown). The directions are taken from the generated transformation products. For the library and biotransformer algorithms the log P values are compared of a TP and its parent. Here, it is assumed that lower log P values result in earlier elution (i.e. typical with reversed phase LC). For the logic algorithm the retention time direction is taken from the transformation rules table. Note that specifying a large enough value for the minRTDiff argument to generateComponents is important to ensure that some tolerance exists while comparing retention time directions of parent and TPs. This filter does nothing if either the observed or expected direction is zero.

When TPs data was generated with the logic algorithm it is recommended to use the formulas filter. This filter uses formula annotations to verify that (1) a parent feature group contains the elements that are subtracted during the transformation and (2) the TP feature group contains the elements that were added during the transformation. Since the 'right' candidate formula is most likely not yet known, this filter looks at all candidates. Therefore, it is recommended to filter the formulas object, for instance, with the topMost filter.

Finally, the plotGraph() method function that was introduced exploring transformation hierarchies for structure TPs, can also incorporate componentization results to simplify the plot and mark TP hits:

plotGraph(TPsBT, which = "Atrazine", components = componTP)

Omitting transformation product input

The TPs argument to generateComponents can also be omitted. In this case every feature group of fGroupTPs is considered to be a potential TP for the potential parents specified for fGroups. An advantage is that the screening workflow is not limited to any known TPs or transformations. However, such a workflow has high demands on prioritiation steps before and after the componentization to rule out the many false positives that may occur.

When no transformation data is supplied it is crucial to make a prior distinction between parent and TP feature groups. Afterwards, the MS/MS spectral and other annotation similarity filters mentioned in the previous section may be a powerful way to further prioritize data.

The fourth example demonstrates such a workflow.

Reporting TP components

The TP components can be reported with the report function. This is done by setting the components function argument (i.e. equally to all other component types). The results will be displayed with a customized format that allows easy exploring of each parent with its TPs. In addition, the TPs argument can be set to include additional data such as transformation pathways.

report(fGroups, components = componTP, TPs = TPs)

Example workflows {#TPsExamples}

The next subsections demonstrate several approaches to perform a TP screening workflow with patRoon. In all examples it is assumed that feature groups were already obtained (with the findFeatures and groupFeatures functions) and stored in the fGroups variable.

The workflows with patRoon are designed to be flexible, and the examples here are primarily meant to implement your own workflow. Furthermore, some of the techniques used in the examples can also be combined. For instance, the Fold change classification and MS/MS similarity filters applied in the fourth example could also be applied to any of the other examples.

Screen predicted TPs for targets {#TPsEx1}

The first example is a simple workflow where TPs are predicted for a set of given parents with [BioTransformer] and subsequently screened. A [MetFrag] compound database is generated and used for annotation.

# predict TPs for a fixed list of parents
TPs <- generateTPs("biotransformer", parents = patRoonData::suspectsPos)

# screen for the TPs
suspectsTPs <- convertToSuspects(TPs, includeParents = FALSE)
fGroupsTPs <- screenSuspects(fGroups, suspectsTPs, adduct = "[M+H]+", onlyHits = TRUE)

# perform annotation of TPs
mslistsTPs <- generateMSPeakLists(fGroupsTPs, "mzr")
convertToMFDB(TPs, "TP-database.csv", includeParents = FALSE) # generate MetFrag database
compoundsTPs <- generateCompounds(fGroupsTPs, mslistsTPs, "metfrag", adduct = "[M+H]+", database = "csv",
                                  extraOpts = list(LocalDatabasePath = "TP-database.csv"))

Screening TPs from a library for suspects {#TPsEx2}

In this example TPs of interest are obtained for the parents that surfaced from of a suspect screening. The steps of this workflow are:

  1. Suspect screening parents.
  2. Obtain TPs for the suspect hits from a library.
  3. A second suspect screening is performed for TPs and the original parent screening results are amended. Note that the parent data is needed for componentization.
  4. Both parents and TPs are annotated using a database generated from their chemical structures.
  5. Some prioritization is performed by a. Only keeping candidate structures for which in-silico fragmentation resulted in at least one annotated MS/MS peak. b. Only keeping suspect hits with an estimated identification level of 3 or better.
  6. The TP components are made and only feature groups with parent/TP assignments are kept.
  7. All results are reported.
# step 1
fGroupsScr <- screenSuspects(fGroups, patRoonData::suspectsPos, adduct = "[M+H]+")
# step 2
TPs <- generateTPs("library", parents = fGroupsScr)

# step 3
suspects <- convertToSuspects(TPs)
fGroupsScr <- screenSuspects(fGroupsScr, suspects, adduct = "[M+H]+", onlyHits = TRUE, amend = TRUE)

# step 4
mslistsScr <- generateMSPeakLists(fGroupsScr, "mzr")
convertToMFDB(TPs, "TP-database.csv", includeParents = TRUE)
compoundsScr <- generateCompounds(fGroupsScr, mslistsScr, "metfrag", adduct = "[M+H]+", database = "csv",
                                  extraOpts = list(LocalDatabasePath = "TP-database.csv"))

# step 5a
compoundsScr <- filter(compoundsScr, minExplainedPeaks = 1)

# step 5b
fGroupsScrAnn <- annotateSuspects(fGroupsScr, MSPeakLists = mslistsScr,
                                  compounds = compoundsScr)
fGroupsScrAnn <- filter(fGroupsScrAnn, maxLevel = 3, onlyHits = TRUE)

# step 6
componTP <- generateComponents(fGroupsScrAnn, "tp", TPs = TPs, MSPeakLists = mslistsScr,
                               compounds = compoundsScr)
fGroupsScrAnn <- fGroupsScrAnn[results = componTP]

# step 7
report(fGroupsScrAnn, MSPeakLists = mslistsScr, compounds = compoundsScr,
       components = componTP, TPs = TPs)

Non-target screening of predicted TPs {#TPsEx3}

This example uses metabolic logic to calculate possible TPs for all feature groups from a complete non-target screening. This example demonstrates how a workflow can be performed when little is known about the identity of the parents. The steps of this workflow are:

  1. Formula annotations are performed for all feature groups.
  2. These results are then limited to the top 5 candidates, and only feature groups with annotations are kept.
  3. The TPs are calculated for all remaining feature groups.
  4. A suspect screening is performed to find the TPs. Unlike the previous example feature groups without hits are kept (discussed here).
  5. The components are generated
  6. The components are filtered: a. The TPs must follow an expected retention time direction b. The parent/TPs should have at least one candidate formula that fits with the transformation.
  7. Only feature groups are kept with parent/TP assignments and all results are reported.
# steps 1-2
mslists <- generateMSPeakLists(fGroups, "mzr")
formulas <- generateFormulas(fGroups, mslists, "genform", adduct = "[M+H]+")
formulas <- filter(formulas, topMost = 5)
fGroups <- fGroups[results = formulas]

# step 3
TPs <- generateTPs("logic", fGroups = fGroups, adduct = "[M+H]+")

# step 4
suspects <- convertToSuspects(TPs)
fGroupsScr <- screenSuspects(fGroups, suspects, adduct = "[M+H]+", onlyHits = FALSE)

# step 5
componTP <- generateComponents(fGroupsScr, "tp", TPs = TPs, MSPeakLists = mslists, formulas = formulas)

# step 6
componTP <- filter(componTP, retDirMatch = TRUE, formulas = formulas)

# step 7
fGroupsScr <- fGroupsScr[results = componTP]
report(fGroupsScr, MSPeakLists = mslists, formulas = formulas, components = componTP)

Non-target screening of TPs by annotation similarities {#TPsEx4}

This example shows a workflow where no TP data from a prediction or library is used. Instead, this workflow relies on statistics and MS/MS data to find feature groups which may potentially have a parent - TP relationship. The workflow is similar to that of the previous example. The steps of this workflow are:

  1. Fold changes (FC) between two sample groups are calculated to classify which feature groups are decreasing (i.e. parents) or increasing (i.e. TPs).
  2. Feature groups without classification are removed.
  3. Formula annotations are performed like the previous example.
  4. The componentization is performed and the FC classifications are used to specify which feature groups are to be considered parents or TPs.
  5. Only TPs are kept that show a high MS/MS spectral similarity and share at least one fragment with their parent.
  6. Only feature groups are kept with parent/TP assignments and all results are reported.
# step 1
tab <- as.data.table(fGroups, FCParams = getFCParams(c("before", "after")))
groupsParents <- tab[classification == "decrease"]$group
groupsTPs <- tab[classification == "increase"]$group

# step 2
fGroups <- fGroups[, union(groupsParents, groupsTPs)]

# step 3
mslists <- generateMSPeakLists(fGroups, "mzr")
formulas <- generateFormulas(fGroups, mslists, "genform", adduct = "[M+H]+")
formulas <- filter(formulas, topMost = 5)
fGroups <- fGroups[results = formulas]

# step 4
componTP <- generateComponents(algorithm = "tp",
                               fGroups = fGroups[, groupsParents],
                               fGroupsTPs = fGroups[, groupsTPs],
                               MSPeakLists = mslists, formulas = formulas)

# step 5
componTP <- filter(componTP, minSpecSimBoth = 0.75, minFragMatches = 1)

# step 6
fGroups <- fGroups[results = componTP]
report(fGroups, MSPeakLists = mslists, formulas = formulas, components = componTP)


rickhelmus/patRoon documentation built on April 25, 2024, 8:15 a.m.