compound_tbl_sdf: Extract compound data from a file in SDF format
In EuracBiomedicalResearch/CompoundDb: Creating and Using (Chemical) Compound Annotation Databases

compound_tbl_sdf

R Documentation

Extract compound data from a file in SDF format

Description

compound_tbl_sdf() extracts basic compound annotations from a file in SDF format (structure-data file). The function currently supports SDF files from:

HMDB (Human Metabolome Database): http://www.hmdb.ca
ChEBI (Chemical Entities of Biological Interest): http://ebi.ac.uk/chebi
LMSD (LIPID MAPS Structure Database): http://www.lipidmaps.org
PubChem: https://pubchem.ncbi.nlm.nih.gov/
MoNa: http://mona.fiehnlab.ucdavis.edu/ (see notes below!)

Usage

compound_tbl_sdf(file, collapse, onlyValid = TRUE, nonStop = TRUE)

Arguments

`file`	`character(1)` with the name of the SDF file.
`collapse`	optional `character(1)` to be used to collapse multiple values in the columns `"synonyms"`. See examples for details.
`onlyValid`	`logical(1)` to import only valid or all elements (defaults to `onlyValid = TRUE`)
`nonStop`	`logical(1)` whether file content specific errors should only reported as warnings and not break the full import process. The value of this parameter is passed to parameter `skipErrors` of the `ChemmineR::read.SDFset()` function.

Details

Column "name" reports for HMDB files the "GENERIC_NAME", for ChEBI the "ChEBI Name", for PubChem the "PUBCHEM_IUPAC_TRADITIONAL_NAME", and for Lipid Maps the "COMMON_NAME", if that is not available, the first of the compounds synonyms and, if that is also not provided, the "SYSTEMATIC_NAME".

Value

A tibble::tibble with general compound information (one row per compound):

compound_id: the ID of the compound.
name: the compound's name.
inchi: the InChI of the compound.
inchikey: the InChI key.
formula: the chemical formula.
exactmass: the compound's (monoisotopic exact) mass.
synonyms: the compound's synonyms (aliases). This type of this column is by default a list to support multiple aliases per compound, unless argument collapse is provided, in which case multiple synonyms are pasted into a single element separated by the value of collapse.
smiles: the compound's SMILES (if provided).

Note

compound_tbl_sdf() supports also to read/process gzipped files.

MoNa SDF files organize the data by individual spectra (i.e. each element is one spectrum) and individual compounds can not easily and consistently defined (i.e. not all entries have an InChI ID or other means to uniquely identify compounds). Thus, the function returns a highly redundant compound table. Feedback on how to reduce this redundancy would be highly welcome!

LIPID MAPS was tested August 2020. Older SDF files might not work as the field names were changed.

Author(s)

Johannes Rainer and Jan Stanstrup

Examples


## Read compound information from a subset of HMDB
fl <- system.file("sdf/HMDB_sub.sdf.gz", package = "CompoundDb")
cmps <- compound_tbl_sdf(fl)
cmps

## Column synonyms contains a list
cmps$synonyms

## If we provide the optional argument collapse, multiple entries will be
## collapsed.
cmps <- compound_tbl_sdf(fl, collapse = "|")
cmps
cmps$synonyms

EuracBiomedicalResearch/CompoundDb documentation built on Jan. 19, 2025, 8:34 a.m.

EuracBiomedicalResearch/CompoundDb index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EuracBiomedicalResearch/CompoundDb
Creating and Using (Chemical) Compound Annotation Databases

compound_tbl_sdf: Extract compound data from a file in SDF format
In EuracBiomedicalResearch/CompoundDb: Creating and Using (Chemical) Compound Annotation Databases

Extract compound data from a file in SDF format

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Related to compound_tbl_sdf in EuracBiomedicalResearch/CompoundDb...

R Package Documentation

Browse R Packages

We want your feedback!

EuracBiomedicalResearch/CompoundDb Creating and Using (Chemical) Compound Annotation Databases

compound_tbl_sdf: Extract compound data from a file in SDF format In EuracBiomedicalResearch/CompoundDb: Creating and Using (Chemical) Compound Annotation Databases

Extract compound data from a file in SDF format

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Related to compound_tbl_sdf in EuracBiomedicalResearch/CompoundDb...

R Package Documentation

Browse R Packages

We want your feedback!

EuracBiomedicalResearch/CompoundDb
Creating and Using (Chemical) Compound Annotation Databases

compound_tbl_sdf: Extract compound data from a file in SDF format
In EuracBiomedicalResearch/CompoundDb: Creating and Using (Chemical) Compound Annotation Databases