MetFragConfig: Create MetFrag Configuration Files

Description Usage Arguments Value Author(s) See Also Examples

View source: R/MetFragConfigR.R

Description

This function provides options to set up configuration files to run MetFrag Command Line in batch mode. Minimum information is mass, adduct type and MS/MS peak list. MetFrag Command Line is available from http://c-ruttkies.github.io/MetFrag/projects/metfragcl/

Usage

1
2
3
4
5
6
7
8
9
MetFragConfig(mass, adduct_type, results_filename, peaklist_path, base_dir,
DB=c("PubChem"),
localDB_path="", output="XLS", token="", neutralPrecursorMass=FALSE, 
ppm=5, mzabs=0.001, frag_ppm=5, IsPosMode=TRUE,
tree_depth=2, num_threads=1, add_refs=TRUE, minInt=0, rt_file_path="", rt_exp=0,suspect_path="",
suspect_filter=FALSE, UDS_Category="", UDS_Weights="", DB_IDs="", mol_form="", useFormula=FALSE,
useMoNAMetFusion=TRUE, useMonaIndiv=TRUE, MoNAoffline=TRUE, incl_el="",excl_el="", incl_exclusive=FALSE,
incl_smarts_filter="", incl_smarts_score="", excl_smarts_filter="",excl_smarts_score="", filter_isotopes=TRUE,
filter_by_InChIKey=TRUE)

Arguments

mass

The mass with which to search the candidate database (DB). Use neutralPrecursorMass and adduct_type to set whether this is monoisotopic mass or an adduct species.

adduct_type

The adduct species used to define mass (if neutralPrecursorMass=FALSE) and fragmentation settings in the config file, entered as either PrecursorIonType (text) or PrecursorIonmode (a number). The available options are given in the system file MetFragAdductTypes.csv in the extdata folder. If neutralPrecursorMass=TRUE, set adduct_type=0. Recommended default values (if ion state is unclear) are [M+H]+ (1) for positive and [M-H]- (-1) for negative mode.

results_filename

Enter a base filename for naming the results files - do not include file endings

peaklist_path

Enter the full path and file name to the peak list for this config file

base_dir

Enter the directory name to set up the subfolders for MetFrag batch results. If the folders don't exist, subfolders config, log and results are created; the output of this function is saved in config.

DB

Enter query database name. Current options KEGG, PubChem, ExtendedPubChem, ChemSpider, FOR_IDENT, MetaCyc, LocalCSV, LocalPSV or LocalSDF. For HMDB, LipidMaps and KEGG-derivatised use the LocalCSV option with respective files downloaded from https://msbi.ipb-halle.de/~cruttkie/databases/.

localDB_path

Full path and file name to the local database for LocalCSV, LocalPSV or localSDF. Otherwise leave empty. If the file is not found, the config file defaults to DB=PubChem.

output

Select output format(s) desired. Current options include one or more of SDF, XLS, CSV, ExtendedXLS, ExtendedFragmentsXLS entered as a string. Not tested; incorrect entries will lead to CL failure.

token

ChemSpider token, only required for DB=ChemSpider. See http://www.chemspider.com/MassSpecAPI.asmx for more details about which services require tokens and http://www.chemspider.com/help-create-a-chemspider-account.aspx for information how to obtain your token. If an invalid token is provided (not length=36), DB defaults to PubChem.

neutralPrecursorMass

Controls whether mass is treated as a neutral or charged mass. If TRUE, treated as neutral. If FALSE (default), this is entered as a charged mass, adjusted in MetFragCL with the adduct_type setting.

mol_form

A string containing the molecular formula (used in candidate retreival)

useFormula

Default FALSE means an exact mass search is performed. If TRUE, mol_form must be given and candidate retreival is based on this formula. Note some databases are sensitive to the order of elements in the formula.

DB_IDs

Use this to select only certain candidates using (comma-separated) database identifiers consistent with DB.

ppm

The ppm error to perform the exact mass search for candidate retrieval (default 5 ppm)

mzabs

The absolute error (in Da/Th) used to match fragments to observed MS/MS peaks. Additive with frag_ppm. Default 0.001 Da (Th).

frag_ppm

The relative error (in ppm) used to match fragments to observed MS/MS peaks. Additive with mzabs. Default 5 ppm.

IsPosMode

Controls the mode for both candidate retrieval and fragmentation consistently. Default TRUE sets positive mode, switch to FALSE for negative mode data.

tree_depth

Sets the number of fragmentation steps. Default=2 is recommended. Higher values lead to long calculation times.

num_threads

Sets the number of threads used to run calculations. Default=1; set higher for faster results.

add_refs

If set to (default) TRUE, reference scoring terms will be added for DB=PubChem and DB=ChemSpider. Two terms (references, patents) are added for PubChem, weighted 0.5; four terms weighted 0.25 for ChemSpider. These setttings can be overwritten by setting add_refs=FALSE and adding the desired terms to UDS_Category and UDS_weights.

minInt

Minimum intensity value to consider peaks in the MS/MS file. Default 0, this is merely a convenience function to allow users to do a bare minimum noise reduction if required.

rt_file_path

Full path to the CSV file containing InChIs and retention times (RTs) of standards to build the RT model. The file should contain two column separated columns with a header row with the column names InChI and RetentionTime. The example system file Eawag_rt_inchi.csv in the extdata folder is the correct dataset for Eawag MassBank records measured on the XBridge C18 column.

rt_exp

The experimental retention time. The chromatography and RT unit must match with the file in rt_file_path.

suspect_path

Path to the suspect lists to be used as a filter or scoring term.

suspect_filter

Default FALSE means suspect lists in suspect_path are used to increase the score of candidates present in the suspect lists given (added as a scoring term). If TRUE, suspect lists are used as a filter instead (only candidates present in the suspect lists are processed).

UDS_Category

A string containing the exact column headers of additional User Defined Scores (UDS) to use, separated by a comma. These column headers must match exactly, cannot be repeated and must be present in the default database chosen or in the LocalCSV, PSV or SDF files used as a local database. This can also be used to overwrite the default reference information in add_refs.

UDS_Weights

A string containing comma-separated weight values corresponding to UDS_Category. This must match exactly or an exception is thrown during processing.

useMoNAMetFusion

Default TRUE means that the MoNA MetFusion Score is added by default. Use FALSE to exclude.

useMonaIndiv

Default TRUE means that the MoNA Individual Score is added by default. Use FALSE to exclude. This performs a direct lookup by InChIKey and returns the highest similarity value over all matches. A good match is a very good sign; a poor match means there is a spectrum in MoNA for that compound but this may have been recorded with vastly different settings, so a poor match does not necessarily indicate that the candidate is wrong.

MoNAoffline

Default TRUE means the local MoNA instance (in the jar file) is used to avoid server issues. Use FALSE to perform this live, however this may not work.

incl_el

A string containing comma-separated elements that must be present in candidates. This allows coupling of an exact mass search with the presence of elements containing distinct isotope patterns.

excl_el

A string containing comma-separated elements that must not be present in candidates. This allows coupling of an exact mass search with the absence of elements containing distinct isotope patterns.

incl_exclusive

Default FALSE indicates that the elements in incl_el must be present, but other elements could still be present. If TRUE, only these elements are considered (use this option with caution!)

incl_smarts_filter

A string containing SMARTS codes (comma-separated) used to define substructures present (candidates that do not contain these SMARTS are filtered out).

incl_smarts_score

A string containing SMARTS codes (comma-separated) used to increase the score of candidates with certain substructures present.

excl_smarts_filter

A string containing SMARTS codes to exclude candidates with these substructures present.

excl_smarts_score

A string containing SMARTS codes to penalize candidate scores with these substructures present.

filter_isotopes

Default TRUE removes all candidates containing non-standard isotopes.

filter_by_InChIKey

Default TRUE collapses the candidate result lists by the first block of the InChIKey, presenting only the candidate with the best score across all categories. If FALSE, all candiates are included in the results.

Value

Creates a MetFrag config file matching the given parameters and returns the file name.

Author(s)

Emma Schymanski <emma.schymanski@uni.lu> in partnership with Christoph Ruttkies (MetFragCL author).

See Also

runMetFrag to run the config files.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Do not run unless you adjusted test_dir to an existing file location
peaklist_path <- system.file("extdata","EA026206_Simazine_peaks.txt",package="ReSOLUTION")
# change this directory to an existing one, or this example won't work
test_dir <- "C:/DATA/Workflow/MetFrag22/metfrag_test_results"
testCSV <- system.file("extdata","dsstox_MS_Ready_MetFragTestCSV5.csv",package="ReSOLUTION")

config_file <- MetFragConfig(201.0776,"[M+H]+","Simazine_neutralMass_PubChem",peaklist_path, test_dir, DB="PubChem",neutralPrecursorMass=TRUE)
config_file2 <- MetFragConfig(202.0854,1,"Simazine_precMass_localCSV",peaklist_path,test_dir,DB="LocalCSV",localDB_path=testCSV)
config_file2 <- MetFragConfig(202.0854,1,"Simazine_precMass_10ppm",peaklist_path,test_dir,DB="LocalCSV",localDB_path=testCSV,ppm=10)
config_file2 <- MetFragConfig(202.0854,1,"Simazine_precMass_10ppm_InChIFilterOff",peaklist_path,test_dir,DB="LocalCSV",
                              localDB_path=testCSV,ppm=10,filter_by_InChIKey = FALSE)

#to find out the adduct states:
MetFragAdductTypes <- read.csv(system.file("extdata","MetFrag_AdductTypes.csv",package="ReSOLUTION"))

# to run the config files
metfrag_dir <- "C:/DATA/Workflow/MetFrag22/"
MetFragCL_name <- "MetFrag2.4.4-msready-CL.jar"
# warning: this first query takes a while, for quick testing run config_file2
runMetFrag(config_file, metfrag_dir, MetFragCL_name)
runMetFrag(config_file2, metfrag_dir, MetFragCL_name)

schymane/ReSOLUTION documentation built on May 22, 2021, 3:41 a.m.