multilevelannotation: multilevelannotation

Description Usage Arguments Details Value Author(s) References

View source: R/multilevelannotation.R

Description

The function uses a multi-level scoring algorithm to annotate features using HMDB, KEGG, T3DB, and LipidMaps.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
multilevelannotation(dataA, max.mz.diff = 10, max.rt.diff = 10, 
cormethod = "pearson", 
num_nodes = 2, queryadductlist = c("all"), mode = "pos",
outloc, db_name = "HMDB", adduct_weights = NA, num_sets = 30,
allsteps = TRUE, corthresh = 0.7, NOPS_check = TRUE, customIDs = NA,
missing.value = NA, hclustmethod="complete", deepsplit = 2, 
networktype = "unsigned",
minclustsize = 10, module.merge.dissimilarity = 0.2, filter.by = c("M+H"),
redundancy_check = TRUE,
min_ions_perchem = 1, biofluid.location = NA, origin = NA,
status = "Detected", boostIDs = NA, max_isp = 5,
MplusH.abundance.ratio.check = FALSE, 
customDB = NA, HMDBselect = "union", mass_defect_window = 0.01,
mass_defect_mode = "pos",  pathwaycheckmode = "pm")

Arguments

dataA

Peak intensity matrix. The first two columns should be "mz" and "time" followed by sample intensities.

max.mz.diff

Mass tolerance in ppm for database matching. e.g.: 10

max.rt.diff

Retention time (s) tolerance for finding peaks derived from the same parent metabolite. e.g.: 10

cormethod

Method for correlation. e.g.: "pearson" or "spearman". The "pearson" implementation is computationally faster.

num_nodes

Number of computing cores to be used for parallel processing. e.g.:2

queryadductlist

Adduct list to be used for database matching. e.g. c("all") for all possible positive or negative adducts or c("M+H","M+Na","M+ACN+H") for specifying subset of adducts. Run data(adduct_table) for list of all adducts.

mode

Ionization mode. e.g.:"pos" or "neg"

num_sets

How many subsets should the total number of metabolites in a database should be divided into to faciliate parallel processing or prevent memory overload? e.g.: 1000

outloc

Output folder location. e.g.: "C:\Documents\ProjectX\"

db_name

Database to be used for annotation. e.g.: "HMDB", "KEGG", "T3DB", "LipidMaps"

adduct_weights

Adduct weight matrix. Run data(adduct_weights) to see an example adduct weight matrix.

allsteps

If FALSE, only step 1 that involves module and retention time based clustering is performed. e.g.: TRUE

corthresh

Minimum correlation threshold between peaks to qualify as adducts/isotopes of the same metabolite.

NOPS_check

Should elemental ratio checks be performed as outlined in Fiehn 2007? e.g. TRUE

customIDs

Custom list of select database IDs (HMDB, KEGG, etc.) to be used for annotation. This should be a data frame. Run data(customIDs) to see an example.

missing.value

How are missing values represented in the input peak intensity matrix? e.g.: NA

hclustmethod

Linkage method for hierarchical clustering e.g.: "complete", "average", "single","ward", "median","centroid","mcquitty". Please see flashClust package for reference.

deepsplit

How finely should the clusters be split? e.g.: 2 Please see WGCNA for reference.

networktype

Please see WGCNA for reference: e.g: "unsigned" or "signed".

minclustsize

Minimum cluster size. e.g: 10

module.merge.dissimilarity

Maximum dissimilarity measure (i.e., 1-correlation) to be used for merging modules in WGCNA. e.g.:0.2

filter.by

Require the presence of certain adducts for a high confidence match. e.g.: c("M+H")

redundancy_check

Should stage 5 that involves redundancy based filtering be performed? e.g.: TRUE or FALSE

min_ions_perchem

Minimum number of adducts/isotopes to be present for a match to be considered high confidence. e.g.:2

biofluid.location

Used only for HMDB. e.g.: NA, "Blood" ,"Urine", "Saliva" Set to NA to turn off this option. Please see http://www.hmdb.ca/metabolites or run data(hmdbAllinf); head(hmdbAllinf) for more details.

origin

Used only for HMDB. e.g.: NA, "Endogenous", "Exogenous", etc. Set to NA to turn off this option. Please see http://www.hmdb.ca/metabolites or run data(hmdbAllinf); head(hmdbAllinf) for more details.

status

Used for HMDB. e.g.: NA, "Detected", "Detected and Quantified", "Detected and Not Quantified", "Expected and Not Quantified". Set to NA to turn off this option. Please see http://www.hmdb.ca/metabolites or run data(hmdbAllinf) for more details.

boostIDs

Databased IDs of previously validated metabolites. e.g.: c("HMDB00696"). Set to NA to turn off this option.

max_isp

Maximum number of expected isotopes. e.g.: 5

MplusH.abundance.ratio.check

Should MplusH be the most abundant adduct? e.g. TRUE or FALSE

customDB

Custom database. Run: data(custom_db); head(custom_db) to see more details on formatting. Set to NA to turn off this option

HMDBselect

How to select metabolites based on HMDB origin, biolfuid, and status filters? e.g.: "all" to take union, "intersect" to take intersection

mass_defect_window

Mass defect window in daltons for finding isotopes. e.g.: 0.01

mass_defect_mode

"pos" for finding positive isotopes; "neg" for finding unexpected losses/fragments; "both" for finding isotopes and unexpected losses/fragments

pathwaycheckmode

How to perform pathway based evaluation? "pm": boosts the scores if there are other metabolites from the same pathway that are also assigned to the same module. "p": boosts the scores if there are other metabolites from the same pathway without accounting for module membership.

Details

Multistage clustering algorithm based on intensity profiles, retention time characteristics, mass defect, isotope/adduct patterns and correlation with signals for metabolic precursors and products. The algorithm uses high-resolution mass spectrometry data for a series of samples with common properties and publicly available chemical, metabolic and environmental databases to assign confidence levels to annotation results.

Value

The function generates output at each stage: Stage 1 includes modules and retention time based clustering of features without any annotation Stage 2 includes modules and retention time based clustering of features along with simple m/z based database matching Stage 3 includes scores for annotations assigned in stage 2 Stages 4 and 5 include the confidence levels before and after redundancy (multiple matches) filtering, respectively

Author(s)

Karan Uppal <kuppal2@emory.edu>

References

Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008 Dec 29;9:559. Wishart DS et al. HMDB 3.0–The Human Metabolome Database in 2013. Nucleic Acids Res. 2013 Jan;41(Database issue):D801-7. Kanehisa M. The KEGG database. Novartis Found Symp. 2002; 247:91-101;discussion 101-3, 119-28, 244-52. Review. Lim E, et al. T3DB: a comprehensively annotated database of common toxins and their targets. Nucleic Acids Res. 2010 Jan;38(Database issue):D781-6. Sleno L. The use of mass defect in modern mass spectrometry. J Mass Spectrom. 2012 Feb;47(2):226-36. Zhang H, Zhang D, Ray K, Zhu M. Mass defect filter technique and its applications to drug metabolite identification by high-resolution mass spectrometry. J Mass Spectrom. 2009 Jul;44(7):999-1016.


jaspershen/MSannotator documentation built on May 18, 2019, 5:55 p.m.