generateCompoundsMetFrag | R Documentation |
Uses the metfRag package or MetFrag CL
for compound identification (see
http://ipb-halle.github.io/MetFrag/).
generateCompoundsMetFrag(fGroups, ...)
## S4 method for signature 'featureGroups'
generateCompoundsMetFrag(
fGroups,
MSPeakLists,
method = "CL",
timeout = 300,
timeoutRetries = 2,
errorRetries = 2,
topMost = 100,
dbRelMzDev = 5,
fragRelMzDev = 5,
fragAbsMzDev = 0.002,
adduct = NULL,
database = "pubchem",
extendedPubChem = "auto",
chemSpiderToken = "",
scoreTypes = compoundScorings("metfrag", database, onlyDefault = TRUE)$name,
scoreWeights = 1,
preProcessingFilters = c("UnconnectedCompoundFilter", "IsotopeFilter"),
postProcessingFilters = c("InChIKeyFilter"),
maxCandidatesToStop = 2500,
identifiers = NULL,
extraOpts = NULL
)
## S4 method for signature 'featureGroupsSet'
generateCompoundsMetFrag(
fGroups,
MSPeakLists,
method = "CL",
timeout = 300,
timeoutRetries = 2,
errorRetries = 2,
topMost = 100,
dbRelMzDev = 5,
fragRelMzDev = 5,
fragAbsMzDev = 0.002,
adduct = NULL,
...,
setThreshold = 0,
setThresholdAnn = 0,
setAvgSpecificScores = FALSE
)
fGroups |
|
... |
\setsWF Further arguments passed to the non-sets workflow method. |
MSPeakLists |
A |
method |
Which method should be used for MetFrag execution: |
timeout |
Maximum time (in seconds) before a metFrag query for a feature group is stopped. Also see
|
timeoutRetries |
Maximum number of retries after reaching a timeout before completely skipping the metFrag query
for a feature group. Also see |
errorRetries |
Maximum number of retries after an error occurred. This may be useful to handle e.g. connection errors. |
topMost |
Only keep this number of candidates (per feature group) with highest score. Set to |
dbRelMzDev |
Relative mass deviation (in ppm) for database search. Sets the DatabaseSearchRelativeMassDeviation option. |
fragRelMzDev |
Relative mass deviation (in ppm) for fragment matching. Sets the FragmentPeakMatchRelativeMassDeviation option. |
fragAbsMzDev |
Absolute mass deviation (in Da) for fragment matching. Sets the FragmentPeakMatchAbsoluteMassDeviation option. |
adduct |
An The |
database |
Compound database to use. Valid values are: |
extendedPubChem |
If |
chemSpiderToken |
A character string with the ChemSpider security token that should be set when the ChemSpider database is used. Sets the ChemSpiderToken option. |
scoreTypes |
A character vector defining the scoring types. See the |
scoreWeights |
Numeric vector containing weights of the used scoring types. Order is the same as set in
|
preProcessingFilters , postProcessingFilters |
A character vector defining pre/post filters applied before/after
fragmentation and scoring (e.g. |
maxCandidatesToStop |
If more than this number of candidate structures are found then processing will be aborted and no results this feature group will be reported. Low values increase the chance of missing data, whereas too high values will use too much computer resources and signficantly slowdown the process. Sets the MaxCandidateLimitToStop option. |
identifiers |
A |
extraOpts |
A named |
setThreshold |
\setsWF Minimum abundance for a candidate among all sets (‘0-1’). For instance, a value of ‘1’ means that the candidate needs to be present in all the set data. |
setThresholdAnn |
\setsWF As |
setAvgSpecificScores |
\setsWF If |
This function uses MetFrag to generate compound candidates. This function is called when calling generateCompounds
with
algorithm="metfrag"
.
Several online compound databases such as PubChem and
ChemSpider may be chosen for retrieval of candidate structures. This method
requires the availability of MS/MS data, and feature groups without it will be ignored. Many options exist to score
and filter resulting data, and it is highly suggested to optimize these to improve results. The MetFrag
options PeakList
, IonizedPrecursorMass
and ExperimentalRetentionTimeValue
(in minutes) fields
are automatically set from feature data.
generateCompoundsMetFrag
returns a compoundsMF
object.
MetFrag
supports many different scorings to rank candidates. The
compoundScorings
function can be used to get an overview: (some columns are omitted)
name | metfrag | database |
score | Score | |
fragScore | FragmenterScore | |
metFusionScore | OfflineMetFusionScore | |
individualMoNAScore | OfflineIndividualMoNAScore | |
numberPatents | PubChemNumberPatents | pubchem |
numberPatents | Patent_Count | pubchemlite |
pubMedReferences | PubChemNumberPubMedReferences | pubchem |
pubMedReferences | ChemSpiderNumberPubMedReferences | chemspider |
pubMedReferences | NUMBER_OF_PUBMED_ARTICLES | comptox |
pubMedReferences | PubMed_Count | pubchemlite |
extReferenceCount | ChemSpiderNumberExternalReferences | chemspider |
dataSourceCount | ChemSpiderDataSourceCount | chemspider |
referenceCount | ChemSpiderReferenceCount | chemspider |
RSCCount | ChemSpiderRSCCount | chemspider |
smartsInclusionScore | SmartsSubstructureInclusionScore | |
smartsExclusionScore | SmartsSubstructureExclusionScore | |
suspectListScore | SuspectListScore | |
retentionTimeScore | RetentionTimeScore | |
CPDATCount | CPDAT_COUNT | comptox |
TOXCASTActive | TOXCAST_PERCENT_ACTIVE | comptox |
dataSources | DATA_SOURCES | comptox |
pubChemDataSources | PUBCHEM_DATA_SOURCES | comptox |
EXPOCASTPredExpo | EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY | comptox |
ECOTOX | ECOTOX | comptox |
NORMANSUSDAT | NORMANSUSDAT | comptox |
MASSBANKEU | MASSBANKEU | comptox |
TOX21SL | TOX21SL | comptox |
TOXCAST | TOXCAST | comptox |
KEMIMARKET | KEMIMARKET | comptox |
MZCLOUD | MZCLOUD | comptox |
pubMedNeuro | PubMedNeuro | comptox |
CIGARETTES | CIGARETTES | comptox |
INDOORCT16 | INDOORCT16 | comptox |
SRM2585DUST | SRM2585DUST | comptox |
SLTCHEMDB | SLTCHEMDB | comptox |
THSMOKE | THSMOKE | comptox |
ITNANTIBIOTIC | ITNANTIBIOTIC | comptox |
STOFFIDENT | STOFFIDENT | comptox |
KEMIMARKET_EXPO | KEMIMARKET_EXPO | comptox |
KEMIMARKET_HAZ | KEMIMARKET_HAZ | comptox |
REACH2017 | REACH2017 | comptox |
KEMIWW_WDUIndex | KEMIWW_WDUIndex | comptox |
KEMIWW_StpSE | KEMIWW_StpSE | comptox |
KEMIWW_SEHitsOverDL | KEMIWW_SEHitsOverDL | comptox |
ZINC15PHARMA | ZINC15PHARMA | comptox |
PFASMASTER | PFASMASTER | comptox |
peakFingerprintScore | AutomatedPeakFingerprintAnnotationScore | |
lossFingerprintScore | AutomatedLossFingerprintAnnotationScore | |
agroChemInfo | AgroChemInfo | pubchemlite |
bioPathway | BioPathway | pubchemlite |
drugMedicInfo | DrugMedicInfo | pubchemlite |
foodRelated | FoodRelated | pubchemlite |
pharmacoInfo | PharmacoInfo | pubchemlite |
safetyInfo | SafetyInfo | pubchemlite |
toxicityInfo | ToxicityInfo | pubchemlite |
knownUse | KnownUse | pubchemlite |
disorderDisease | DisorderDisease | pubchemlite |
identification | Identification | pubchemlite |
annoTypeCount | FPSum | pubchemlite |
annoTypeCount | AnnoTypeCount | pubchemlite |
annotHitCount | AnnotHitCount | pubchemlite |
In addition, the compoundScorings
function is also useful to programmatically
generate a set of scorings to be used for ranking with MetFrag
. For instance, the following can be given
to the scoreTypes
argument to use all default scorings for PubChem: compoundScorings("metfrag",
"pubchem", onlyDefault=TRUE)$name
.
For all MetFrag
scoring types refer to the Candidate Scores
section on the
MetFragR homepage.
When database="chemspider"
setting the chemSpiderToken
argument is
mandatory.
If a local database is chosen via sdf
, psv
, or csv
then its file location should be set with
the LocalDatabasePath
value via the extraOpts
argument. For example: extraOpts =
list(LocalDatabasePath = "C:/myDB.csv")
.
If database="pubchemlite"
or database="comptox"
and patRoonExt is not installed then the
file location must be specified as above or by setting the
patRoon.path.MetFragPubChemLite
/patRoon.path.MetFragCompTox
option. See the installation section in
the handbook for more details.
generateCompoundsMetFrag uses multiprocessing to parallelize computations. Please see the parallelization section in the handbook for more details and patRoon options for configuration options.
When local database files are used with generateCompoundsMetFrag
(e.g. when
database
is set to "pubchemlite"
, "csv"
etc.) and patRoon.MP.method="future", then
the database file must be present on all the nodes. When pubchemlite
or comptox
is used, the location
for these databases can be configured on the host with the respective package options
(patRoon.path.MetFragPubChemLite and patRoon.path.MetFragCompTox) or made available by installing
the patRoonExt package. Note that these files must also be present on the local host computer, even if
it is not participating in computations.
Ruttkies2016patRoon
generateCompounds
for more details and other algorithms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.