customCMPdb-package: Customize and Query Compound Annotation Database
In customCMPdb: Customize and Query Compound Annotation Database

Description Details Author(s) See Also Examples

This package is served as the query and customization interface for compound annotations from DrugAge, DrugBank, CMAP02 and LINCS databases. It also stores the structure SDF datasets for compounds in the above four databases.

Specifically, the annotation database created by this package is an SQLite database containing 5 tables, including 4 compound annotation tables from DrugAge, DrugBank, CMAP02 and LINCS databases, respectively. The other one is an ID mapping table of ChEMBL IDs to IDs of individual databases. The other 4 datasets stores the structures of compounds in the DrugAge, DrugBank, CMAP02 and LINCS databases in SDF files. For detailed description of the 5 datasets generated by this package, please consult to the vignette of this package by running browseVignettes("customCMPdb"). The actual datasets are hosted in AnnotationHub.

This package also provides functionalities to customize and query the compound annotation SQLite database. Users could add their customized compound annotation tables to the SQLite database and query both the default (DrugAge, DrugBank, CMAP02, LINCS) and customized annotations by providing ChEMBL ids of the query compounds. The customization and query functions are available at customAnnot and queryAnnotDB, respectively.

The description of the 5 datasets in this package is as follows.

Annotation SQLite database:

It is a SQLite database storing compound annotation tables for DrugAge, DrugBank, CMAP02 and LINCS, respectively. It also contains an ID mapping table of ChEMBL ID to IDs of individual databases.

DrugAge SDF:

It is an SDF (Structure-Data File) file storing molecular structures of DrugAge compounds. The source DrugAge annotation file was downloaded from here. The extracted csv file only contains drug names, without id mappings to external resources such as PubChem or ChEMBL. The extracted 'drugage.csv' file was further processed by the processDrugage function in this package. The result DrugAge annotation table as well as the id-mapping table (DrugAge internal id to ChEMBL ID) were then stored in the SQLite annotation database named as 'compoundCollection'. The drug structures were obtained from PubChem CIDs by getIds function from ChemmineR package. The SDFset object was then written to the drugage_build2.sdf file

DrugBank SDF:

This SDF file stores structures of compounds in DrugBank database. The full DrugBank xml file was downloaded from https://www.drugbank.ca/releases/latest. The most recent release version at the time of writing this document is 5.1.5. The extracted xml file was processed by the dbxml2df function in this package. The result DrugBank annotation table was then stored in the compoundCollection SQLite database. The DrugBank to ChEMBL id mappings were obtained from UniChem. The DrugBank SDF file was downloaded from https://www.drugbank.ca/releases/latest#structures. Some validity checks and modifications were made via utilities in the ChemmineR package. The results were written to the drugbank_5.1.5.sdf file

CMAP SDF:

The CMAP compound instance table was downloaded from CMAP02 website and processed by the buildCMAPdb function in this package. The result 'cmap.db' contains both compound annotation and structure information. Since the annotation table only contains PubChem CID, the ChEMBL ids were added via PubChem CID to ChEMBL id mappings from UniChem. The CMAP internal IDs were made for ChEMBL id to CMAP id mappings. The structures were written to the cmap02.sdf file

LINCS SDF:

The LINCS compound annotation table was downloaded from GEO. where only compounds type were selected. The LINCS ids were mapped to ChEMBL ids via inchi key. The LINCS compounds structures were obtained from PubChem CIDs via getIds function from ChemmineR package. The structures were written to the lincs_pilot1.sdf file

The R script of generating the above 5 datasets is available at the 'inst/scripts/make-data.R' file in this package. The file location can be found by running system.file("scripts/make-data.R",package="customCMPdb") in user's R session or from the GitHub repository of this package.

Yuzhu Duan (yduan004@ucr.edu)
Thomas Girke (thomas.girke@ucr.edu)

customAnnot, queryAnnotDB

library(AnnotationHub)
## Not run: 
    ah <- AnnotationHub()
    
    ## Load compoundCollection annotation SQLite database
    query(ah, c("customCMPdb", "annot_0.1"))
    annot_path <- ah[["AH79563"]]
    library(RSQLite)
    conn <- dbConnect(SQLite(), annot_path)
    dbListTables(conn)
    drugAgeAnnot <- dbReadTable(conn, "drugAgeAnnot")
    head(drugAgeAnnot)
    dbDisconnect(conn)
    
    ## Load DrugAge SDF file
    query(ah, c("customCMPdb", "drugage_build2"))
    da_path <- ah[["AH79564"]]
    da_sdfset <- ChemmineR::read.SDFset(da_path)
    
    ## Load DrugBank SDF file
    query(ah, c("customCMPdb", "drugbank_5.1.5"))
    db_path <- ah[["AH79565"]]
    db_sdfset <- ChemmineR::read.SDFset(db_path)
    
    ## Load CMAP SDF file
    query(ah, c("customCMPdb", "cmap02"))
    cmap_path <- ah[["AH79566"]]
    cmap_sdfset <- ChemmineR::read.SDFset(cmap_path)
    
    ## Load LINCS SDF file
    query(ah, c("customCMPdb", "lincs_pilot1"))
    lincs_path <- ah[["AH79567"]]
    lincs_sdfset <- ChemmineR::read.SDFset(lincs_path)

## End(Not run)