Description Usage Arguments Details Source
this function is designed to be used at the end of combinatorial metabolite identification process. It evaluates the multiple layers of evidence which are currently accumulated in the CompMS2 class object to automatically rank possible annotations and identify the annotation with the greatest weight of evidence for every composite spectrum.
1 | metID.buildConsensus(object, ...)
|
object |
a "compMS2" class object. |
include |
character vector of 6 options to build consensus combinatorial metabolite identification see Details below for a description of each. If specific options are not supplied as a character vector then the default is to consider all 7. i.e. c('massAccuracy', 'spectralDB', 'inSilico', 'rtPred', 'chemSim', 'pubMed', 'substructure'). |
metIDWeights |
numeric vector equal in length to include vector (see above). Default is NULL and a simple arithmetic mean will be calculated for all the metabolite identification options included. The metIDWeights will be used to calculate a weighted mean of the combination of metabolite identification options. This option can be used to generate a custom metabolite identification setting which best annotates the unknown metabolites. N.B. The sum of the metIDWeights vector must be 1. e.g. include= c('massAccuracy', 'spectralDB', 'inSilico') and metIDWeights=c(0.2, 0.5, 0.3) therefore massAccuracy will be given a weight of 0.2 (20%), spectralDB matches will be given a weight of 0.5 (50%) and in silico fragmentation score will be given a weight of 0.3 (30%). rtPred (predicted retention time), chemSim (nearest neighbour chemical similarity score) and pubMed (number of pubmed citations) will not be included. |
autoPossId |
logical if TRUE the function will automatically add the names
of the top annotation based on mean consensus annotation score to the
"metID comments" table (default = FALSE). Caution if TRUE this will overwrite
any existing possible_identities in the "metID comments" table.
This functionality is intended as an automatic metabolite annotation
identification tool prior to thorough examination of the data in
|
minMeanBCscore |
numeric minimum mean consensus score (values between 0-1), if argument autoPossId is TRUE any metabolite annotations above this value will be automatically added to the "metID comments" table. (if argument not supplied the default is the upper interquartile range of the mean BC score). |
possContam |
numeric how many times does a possible annotation have to appear in the automatically generated possible annotations for it to be considered a contaminant and therefore not added to the "metID comment" table (default = 3, i.e. if a database name appears more than 3 times in the automatic annotation table it will be removed). |
verbose |
logical if TRUE display progress bars. |
Specifically the function looks at the following 7 pieces of evidence:
"massAccuracy" monoisotopic mass similarity. Absolute mass similarity between 0 and the upper mass accuracy limit (default 10 ppm) are used to generate a ranking score between 0-1.
"spectralDB" spectral database match. If a match has been made to a spectral database
using the function metID.matchSpectralDB
then a combination of
the dot product score and proportion of the composite spectrum explained is
used to rank the annotations. A score is determined between 0-1 based on the
average dot product and proportion of composite spectrum is explained.
Where 1 is perfect agreement and 0 is no agreement. If no spectral database
match has been made then the value is set to NA and this score will not be
used in calculating the average ranking.
"inSilico" in silico fragmentation data. Both the results of the
metID.metFrag
and metID.CFM
functions. The total
proportion of the composite spectrum explained by each in silico
fragmentation method (a value between 0-1) is used to rank the annotations.
If no in silico fragmentation match has been made then a value of NA
is set and this score will not be used in calculating the average ranking.
"rtPred" predicted retention time similarity. Annotations are ranked based on the
retention time deviation from the predictive retention time model built using
the function metID.rtPred
. A relative score between 0-1 is
calculated globally by taking the range of retention time deviation values.
"chemSim" chemical similarity score. The mean maximum 1st neighbour (connected by
correlation metID.corrNetwork
and/or spectral similarity
metID.specSimNetwork
) tanimoto chemical similarity scores
calculated by metID.chemSim
is used to rank annotations.
A relative score between 0-1 is calculated globally by taking the range of
mean maximum 1st neighbour chemical similarity scores.
"pubMed" crude literature based plausibility. The number of PMIDs returned by searching
the compound name in PubMed. Number of returned PMIDs are used to generate
a relative score ranking between 0-1. This aspect is highly reliant on the
database name being the correct synonym to search the PubMed repository with.
In an effort to ensure phospholipids are correctly search against PubMed
a set of regular expressions has been created to identify common phospholipid
annotations and use the compound class name rather than an abbreviation with
positional and fatty acid chain length information to obscure the number of
pubmed abstract ids returned (see lipidAbbrev
).
This aspect is potentially time consuming (but only needs to be conducted once) as it complies closely with the NCBI recommendations from the section "Frequency, Timing and Registration of E-utility URL Requests" of book "A General Introduction to the E-utilities" by Eric Sayers http://www.ncbi.nlm.nih.gov/books/NBK25497/:
"In order not to overload the E-utility servers, NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. Failure to comply with this policy may result in an IP address being blocked from accessing NCBI."
This aspect is optional and will only work during these recommended times. However the function can optionally wait until the recommended time automatically.
"substructure" should the substructure score generated by the
dbProb
function be used to rank possible annotations.
Depending on the availabilty of each of these pieces of evidence a mean annotation ranking score is calculated for every annotation and the best annotations can be automatically added.
Sayers E. A General Introduction to the E-utilities. In: Entrez Programming Utilities Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK25497
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.