Description Details Author(s) See Also Examples
This package is served as the query and customization interface for compound annotations from DrugAge, DrugBank, CMAP02 and LINCS databases. It also stores the structure SDF datasets for compounds in the above four databases.
Specifically, the annotation database created by this package is an SQLite database
containing 5 tables, including 4 compound annotation tables from DrugAge,
DrugBank, CMAP02 and LINCS databases, respectively. The other one is an ID
mapping table of ChEMBL IDs to IDs of individual databases. The other 4 datasets
stores the structures of compounds in the DrugAge, DrugBank, CMAP02 and LINCS
databases in SDF files. For detailed description of the 5 datasets generated
by this package, please consult to the vignette of this package by running
browseVignettes("customCMPdb")
. The actual datasets are hosted in
AnnotationHub
.
This package also provides functionalities to customize and query the compound
annotation SQLite database. Users could add their customized compound annotation
tables to the SQLite database and query both the default (DrugAge, DrugBank, CMAP02,
LINCS) and customized annotations by providing ChEMBL ids of the query compounds.
The customization and query functions are available at customAnnot
and queryAnnotDB
, respectively.
The description of the 5 datasets in this package is as follows.
Annotation SQLite database:
It is a SQLite database storing compound annotation tables for DrugAge, DrugBank, CMAP02 and LINCS, respectively. It also contains an ID mapping table of ChEMBL ID to IDs of individual databases.
DrugAge SDF:
It is an SDF (Structure-Data File) file storing molecular structures of
DrugAge compounds. The source DrugAge annotation file was downloaded from
here. The extracted csv
file only contains drug names, without id mappings to external resources
such as PubChem or ChEMBL. The extracted 'drugage.csv' file was further processed by the
processDrugage
function in this package. The result DrugAge annotation table
as well as the id-mapping table (DrugAge internal id to ChEMBL ID) were then
stored in the SQLite annotation database named as 'compoundCollection'.
The drug structures were obtained from PubChem CIDs by getIds
function from ChemmineR package. The SDFset
object was then
written to the drugage_build2.sdf
file
DrugBank SDF:
This SDF file stores structures of compounds in
DrugBank database. The full DrugBank xml
file was downloaded from https://www.drugbank.ca/releases/latest.
The most recent release version at the time of writing this document is 5.1.5.
The extracted xml file was processed by the dbxml2df
function in this package.
The result DrugBank annotation table was then stored in the compoundCollection
SQLite database. The DrugBank to ChEMBL id mappings were obtained from
UniChem.
The DrugBank SDF file was downloaded from
https://www.drugbank.ca/releases/latest#structures.
Some validity checks and modifications were made via utilities in the
ChemmineR package. The results were written to the drugbank_5.1.5.sdf
file
CMAP SDF:
The CMAP compound instance table was downloaded from
CMAP02
website and processed by the buildCMAPdb
function
in this package. The result 'cmap.db' contains both compound annotation and
structure information.
Since the annotation table only contains PubChem CID, the ChEMBL ids were added
via PubChem CID to ChEMBL id mappings from
UniChem.
The CMAP internal IDs were made for ChEMBL id to CMAP id mappings. The
structures were written to the cmap02.sdf
file
LINCS SDF:
The LINCS compound annotation table was downloaded from
GEO.
where only compounds type were selected.
The LINCS ids were mapped to ChEMBL ids via inchi key. The LINCS compounds
structures were obtained from PubChem CIDs via getIds
function from
ChemmineR package. The structures were written to the lincs_pilot1.sdf
file
The R script of generating the above 5 datasets is available at the
'inst/scripts/make-data.R' file in this package. The file location can
be found by running system.file("scripts/make-data.R",package="customCMPdb")
in user's R session or from the
GitHub repository
of this package.
Yuzhu Duan (yduan004@ucr.edu)
Thomas Girke (thomas.girke@ucr.edu)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | library(AnnotationHub)
## Not run:
ah <- AnnotationHub()
## Load compoundCollection annotation SQLite database
query(ah, c("customCMPdb", "annot_0.1"))
annot_path <- ah[["AH79563"]]
library(RSQLite)
conn <- dbConnect(SQLite(), annot_path)
dbListTables(conn)
drugAgeAnnot <- dbReadTable(conn, "drugAgeAnnot")
head(drugAgeAnnot)
dbDisconnect(conn)
## Load DrugAge SDF file
query(ah, c("customCMPdb", "drugage_build2"))
da_path <- ah[["AH79564"]]
da_sdfset <- ChemmineR::read.SDFset(da_path)
## Load DrugBank SDF file
query(ah, c("customCMPdb", "drugbank_5.1.5"))
db_path <- ah[["AH79565"]]
db_sdfset <- ChemmineR::read.SDFset(db_path)
## Load CMAP SDF file
query(ah, c("customCMPdb", "cmap02"))
cmap_path <- ah[["AH79566"]]
cmap_sdfset <- ChemmineR::read.SDFset(cmap_path)
## Load LINCS SDF file
query(ah, c("customCMPdb", "lincs_pilot1"))
lincs_path <- ah[["AH79567"]]
lincs_sdfset <- ChemmineR::read.SDFset(lincs_path)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.