MetFragConfig: Create MetFrag Configuration Files
In schymane/ReSOLUTION: SOLUTIONS for High ReSOLUTION Mass Spectrometry

Description Usage Arguments Value Author(s) See Also Examples

This function provides options to set up configuration files to run MetFrag Command Line in batch mode. Minimum information is mass, adduct type and MS/MS peak list. MetFrag Command Line is available from http://c-ruttkies.github.io/MetFrag/projects/metfragcl/

MetFragConfig(mass, adduct_type, results_filename, peaklist_path, base_dir,
DB=c("PubChem"),
localDB_path="", output="XLS", token="", neutralPrecursorMass=FALSE, 
ppm=5, mzabs=0.001, frag_ppm=5, IsPosMode=TRUE,
tree_depth=2, num_threads=1, add_refs=TRUE, minInt=0, rt_file_path="", rt_exp=0,suspect_path="",
suspect_filter=FALSE, UDS_Category="", UDS_Weights="", DB_IDs="", mol_form="", useFormula=FALSE,
useMoNAMetFusion=TRUE, useMonaIndiv=TRUE, MoNAoffline=TRUE, incl_el="",excl_el="", incl_exclusive=FALSE,
incl_smarts_filter="", incl_smarts_score="", excl_smarts_filter="",excl_smarts_score="", filter_isotopes=TRUE,
filter_by_InChIKey=TRUE)

`mass`	The mass with which to search the candidate database (`DB`). Use `neutralPrecursorMass` and `adduct_type` to set whether this is monoisotopic mass or an adduct species.
`adduct_type`	The adduct species used to define mass (if `neutralPrecursorMass=FALSE`) and fragmentation settings in the config file, entered as either `PrecursorIonType` (text) or `PrecursorIonmode` (a number). The available options are given in the system file `MetFragAdductTypes.csv` in the `extdata` folder. If `neutralPrecursorMass=TRUE`, set `adduct_type=0`. Recommended default values (if ion state is unclear) are `[M+H]+` (1) for positive and `[M-H]-` (-1) for negative mode.
`results_filename`	Enter a base filename for naming the results files - do not include file endings
`peaklist_path`	Enter the full path and file name to the peak list for this config file
`base_dir`	Enter the directory name to set up the subfolders for MetFrag batch results. If the folders don't exist, subfolders `config`, `log` and `results` are created; the output of this function is saved in `config`.
`DB`	Enter query database name. Current options `KEGG`, `PubChem`, `ExtendedPubChem`, `ChemSpider`, `FOR_IDENT`, `MetaCyc`, `LocalCSV`, `LocalPSV` or `LocalSDF`. For `HMDB`, `LipidMaps` and `KEGG-derivatised` use the `LocalCSV` option with respective files downloaded from https://msbi.ipb-halle.de/~cruttkie/databases/.
`localDB_path`	Full path and file name to the local database for `LocalCSV, LocalPSV or localSDF`. Otherwise leave empty. If the file is not found, the config file defaults to `DB=PubChem`.
`output`	Select output format(s) desired. Current options include one or more of `SDF, XLS, CSV, ExtendedXLS, ExtendedFragmentsXLS` entered as a string. Not tested; incorrect entries will lead to CL failure.
`token`	ChemSpider token, only required for `DB=ChemSpider`. See http://www.chemspider.com/MassSpecAPI.asmx for more details about which services require tokens and http://www.chemspider.com/help-create-a-chemspider-account.aspx for information how to obtain your token. If an invalid token is provided (not length=36), `DB` defaults to `PubChem`.
`neutralPrecursorMass`	Controls whether `mass` is treated as a neutral or charged mass. If `TRUE`, treated as neutral. If `FALSE` (default), this is entered as a charged mass, adjusted in MetFragCL with the `adduct_type` setting.
`mol_form`	A string containing the molecular formula (used in candidate retreival)
`useFormula`	Default `FALSE` means an exact mass search is performed. If `TRUE`, `mol_form` must be given and candidate retreival is based on this formula. Note some databases are sensitive to the order of elements in the formula.
`DB_IDs`	Use this to select only certain candidates using (comma-separated) database identifiers consistent with `DB`.
`ppm`	The ppm error to perform the exact mass search for candidate retrieval (default 5 ppm)
`mzabs`	The absolute error (in Da/Th) used to match fragments to observed MS/MS peaks. Additive with `frag_ppm`. Default 0.001 Da (Th).
`frag_ppm`	The relative error (in ppm) used to match fragments to observed MS/MS peaks. Additive with `mzabs`. Default 5 ppm.
`IsPosMode`	Controls the mode for both candidate retrieval and fragmentation consistently. Default `TRUE` sets positive mode, switch to `FALSE` for negative mode data.
`tree_depth`	Sets the number of fragmentation steps. Default=2 is recommended. Higher values lead to long calculation times.
`num_threads`	Sets the number of threads used to run calculations. Default=1; set higher for faster results.
`add_refs`	If set to (default) `TRUE`, reference scoring terms will be added for `DB=PubChem` and `DB=ChemSpider`. Two terms (references, patents) are added for `PubChem`, weighted 0.5; four terms weighted 0.25 for `ChemSpider`. These setttings can be overwritten by setting `add_refs=FALSE` and adding the desired terms to `UDS_Category` and `UDS_weights`.
`minInt`	Minimum intensity value to consider peaks in the MS/MS file. Default 0, this is merely a convenience function to allow users to do a bare minimum noise reduction if required.
`rt_file_path`	Full path to the CSV file containing InChIs and retention times (RTs) of standards to build the RT model. The file should contain two column separated columns with a header row with the column names `InChI` and `RetentionTime`. The example system file `Eawag_rt_inchi.csv` in the `extdata` folder is the correct dataset for Eawag MassBank records measured on the XBridge C18 column.
`rt_exp`	The experimental retention time. The chromatography and RT unit must match with the file in `rt_file_path`.
`suspect_path`	Path to the suspect lists to be used as a filter or scoring term.
`suspect_filter`	Default `FALSE` means suspect lists in `suspect_path` are used to increase the score of candidates present in the suspect lists given (added as a scoring term). If `TRUE`, suspect lists are used as a filter instead (only candidates present in the suspect lists are processed).
`UDS_Category`	A string containing the exact column headers of additional User Defined Scores (UDS) to use, separated by a comma. These column headers must match exactly, cannot be repeated and must be present in the default database chosen or in the LocalCSV, PSV or SDF files used as a local database. This can also be used to overwrite the default reference information in `add_refs`.
`UDS_Weights`	A string containing comma-separated weight values corresponding to `UDS_Category`. This must match exactly or an exception is thrown during processing.
`useMoNAMetFusion`	Default `TRUE` means that the MoNA MetFusion Score is added by default. Use `FALSE` to exclude.
`useMonaIndiv`	Default `TRUE` means that the MoNA Individual Score is added by default. Use `FALSE` to exclude. This performs a direct lookup by InChIKey and returns the highest similarity value over all matches. A good match is a very good sign; a poor match means there is a spectrum in MoNA for that compound but this may have been recorded with vastly different settings, so a poor match does not necessarily indicate that the candidate is wrong.
`MoNAoffline`	Default `TRUE` means the local MoNA instance (in the jar file) is used to avoid server issues. Use `FALSE` to perform this live, however this may not work.
`incl_el`	A string containing comma-separated elements that must be present in candidates. This allows coupling of an exact mass search with the presence of elements containing distinct isotope patterns.
`excl_el`	A string containing comma-separated elements that must not be present in candidates. This allows coupling of an exact mass search with the absence of elements containing distinct isotope patterns.
`incl_exclusive`	Default `FALSE` indicates that the elements in `incl_el` must be present, but other elements could still be present. If `TRUE`, only these elements are considered (use this option with caution!)
`incl_smarts_filter`	A string containing SMARTS codes (comma-separated) used to define substructures present (candidates that do not contain these SMARTS are filtered out).
`incl_smarts_score`	A string containing SMARTS codes (comma-separated) used to increase the score of candidates with certain substructures present.
`excl_smarts_filter`	A string containing SMARTS codes to exclude candidates with these substructures present.
`excl_smarts_score`	A string containing SMARTS codes to penalize candidate scores with these substructures present.
`filter_isotopes`	Default `TRUE` removes all candidates containing non-standard isotopes.
`filter_by_InChIKey`	Default `TRUE` collapses the candidate result lists by the first block of the InChIKey, presenting only the candidate with the best score across all categories. If `FALSE`, all candiates are included in the results.

Creates a MetFrag config file matching the given parameters and returns the file name.

Emma Schymanski <emma.schymanski@uni.lu> in partnership with Christoph Ruttkies (MetFragCL author).

runMetFrag to run the config files.

# Do not run unless you adjusted test_dir to an existing file location
peaklist_path <- system.file("extdata","EA026206_Simazine_peaks.txt",package="ReSOLUTION")
# change this directory to an existing one, or this example won't work
test_dir <- "C:/DATA/Workflow/MetFrag22/metfrag_test_results"
testCSV <- system.file("extdata","dsstox_MS_Ready_MetFragTestCSV5.csv",package="ReSOLUTION")

config_file <- MetFragConfig(201.0776,"[M+H]+","Simazine_neutralMass_PubChem",peaklist_path, test_dir, DB="PubChem",neutralPrecursorMass=TRUE)
config_file2 <- MetFragConfig(202.0854,1,"Simazine_precMass_localCSV",peaklist_path,test_dir,DB="LocalCSV",localDB_path=testCSV)
config_file2 <- MetFragConfig(202.0854,1,"Simazine_precMass_10ppm",peaklist_path,test_dir,DB="LocalCSV",localDB_path=testCSV,ppm=10)
config_file2 <- MetFragConfig(202.0854,1,"Simazine_precMass_10ppm_InChIFilterOff",peaklist_path,test_dir,DB="LocalCSV",
                              localDB_path=testCSV,ppm=10,filter_by_InChIKey = FALSE)

#to find out the adduct states:
MetFragAdductTypes <- read.csv(system.file("extdata","MetFrag_AdductTypes.csv",package="ReSOLUTION"))

# to run the config files
metfrag_dir <- "C:/DATA/Workflow/MetFrag22/"
MetFragCL_name <- "MetFrag2.4.4-msready-CL.jar"
# warning: this first query takes a while, for quick testing run config_file2
runMetFrag(config_file, metfrag_dir, MetFragCL_name)
runMetFrag(config_file2, metfrag_dir, MetFragCL_name)