View source: R/createCompDbPackage.R
compound_tbl_sdf extracts basic compound annotations from a file in SDF
format (structure-data file). The function currently supports SDF files from:
HMDB (Human Metabolome Database): http://www.hmdb.ca
ChEBI (Chemical Entities of Biological Interest): http://ebi.ac.uk/chebi
LMSD (LIPID MAPS Structure Database): http://www.lipidmaps.org
MoNa: http://mona.fiehnlab.ucdavis.edu/ (see notes below!)
compound_tbl_sdf(file, collapse, onlyValid = TRUE, nonStop = TRUE)
"name" reports for HMDB files the
"ChEBI Name", for PubChem the
and for Lipid Maps the
"COMMON_NAME", if that is
not available, the first of the compounds synonyms and, if that is also not
A tibble::tibble with general compound information (one row per compound):
compound_id: the ID of the compound.
name: the compound's name.
inchi: the InChI of the compound.
inchikey: the InChI key.
formula: the chemical formula.
exactmass: the compound's (monoisotopic exact) mass.
synonyms: the compound's synonyms (aliases). This type of this column is
by default a
list to support multiple aliases per compound, unless
collapse is provided, in which case multiple synonyms are pasted
into a single element separated by the value of
smiles: the compound's SMILES (if provided).
compound_tbl_sdf supports also to read/process gzipped files.
MoNa SDF files organize the data by individual spectra (i.e. each element is one spectrum) and individual compounds can not easily and consistently defined (i.e. not all entries have an InChI ID or other means to uniquely identify compounds). Thus, the function returns a highly redundant compound table. Feedback on how to reduce this redundancy would be highly welcome!
LIPID MAPS was tested August 2020. Older SDF files might not work as the field names were changed.
Johannes Rainer and Jan Stanstrup
createCompDb() for a function to create a SQLite-based compound
Other compound table creation functions:
## Read compound information from a subset of HMDB fl <- system.file("sdf/HMDB_sub.sdf.gz", package = "CompoundDb") cmps <- compound_tbl_sdf(fl) cmps ## Column synonyms contains a list cmps$synonyms ## If we provide the optional argument collapse, multiple entries will be ## collapsed. cmps <- compound_tbl_sdf(fl, collapse = "|") cmps cmps$synonyms
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.