compound_tbl_sdf: Extract compound data from a file in SDF format

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/createCompDbPackage.R

Description

compound_tbl_sdf extracts basic compound annotations from a file in SDF format (structure-data file). The function currently supports SDF files from:

Usage

1

Arguments

file

character(1) with the name of the SDF file.

collapse

optional character(1) to be used to collapse multiple values in the columns "synonyms". See examples for details.

Details

Column "compound_name" reports for HMDB files the "GENERIC_NAME", for ChEBI the "ChEBI Name", for PubChem the "PUBCHEM_IUPAC_TRADITIONAL_NAME", and for Lipid Maps the "COMMON_NAME", if that is not available, the first of the compounds synonyms and, if that is also not provided, the "SYSTEMATIC_NAME".

Value

A tibble::tibble with general compound information (one row per compound):

Note

compound_tbl_sdf supports also to read/process gzipped files.

MoNa SDF files organize the data by individual spectra (i.e. each element is one spectrum) and individual compounds can not easily and consistently defined (i.e. not all entries have an InChI ID or other means to uniquely identify compounds). Thus, the function returns a highly redundant compount table. Feedback on how to reduce this redundancy would be highly welcome!

Author(s)

Johannes Rainer and Jan Stanstrup

See Also

createCompDb() for a function to create a SQLite-based compound database.

Other compound table creation functions: compound_tbl_lipidblast()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Read compound information from a subset of HMDB
fl <- system.file("sdf/HMDB_sub.sdf.gz", package = "CompoundDb")
cmps <- compound_tbl_sdf(fl)
cmps

## Column synonyms contains a list
cmps$synonyms

## If we provide the optional argument collapse, multiple entries will be
## collapsed.
cmps <- compound_tbl_sdf(fl, collapse = "|")
cmps
cmps$synonyms

michaelwitting/CompoundDb documentation built on April 29, 2020, 8:42 p.m.