knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
PubChem is a database of chemical molecules and their biological activities.
It is a part of the National Center for Biotechnology Information (NCBI),
which is a part of the National Institutes of Health (NIH). PubChem provides
a set of APIs to query its database. The AnnotationGx
package provides a set
of functions to query PubChem using these APIs.
The first of these APIs is the PubChem PUG REST API
which is designed to
- make specific queries based on some input identifier and return data which
PubChem has labelled or computed internally [1].
- This API is useful for querying information about a specific chemical compound
such as getting the standardized PubChem identifier (CID) for a given chemical name or
smiles string, or getting the chemical structure for a given CID.
- It provides access to a wide range of data including chemical properties,
bioassay data, and chemical classification data, given a specific identifier.
The second API is the PubChem PUG VIEW API
which is designed to:
- give accesse to aggregated annotations for a given chemical compound [3] that
is mapped to their data, but not curated by PubChem itself.
- i.e it provides access to annotations from external sources such as
UniProt, ChEBI, and ChEMBL, given a specific identifier.
library(AnnotationGx)
The main function that is provided by the package is mapCompound2CID
.
``` {r map aspirin to cid} mapCompound2CID("aspirin")
You can pass in a list of compound names to get the CIDs for all of them at once. ``` {r map multiple compounds to cid} drugs <- c( "Aspirin", "Erlotinib", "Acadesine", "Camptothecin", "Vincaleukoblastine", "Cisplatin" ) mapCompound2CID(drugs)
It is possible for names to multimap to CIDs. This is the case for
'Vincaleukoblastine' in the above query. In cases of multimapping, usually
the first entry has the highest similarity to the requested drug. To subset
to only the first occurrence of each of drug name use the first = TRUE
argument:
``` {r map multiple compounds to cid and subset to first} mapCompound2CID(drugs, first = TRUE)
In the case of a compound that can't be mapped, `NA` will be returned and a warning will be issued. ``` {r map non existent compound to cid} (result <- mapCompound2CID(c(drugs, "non existent compound", "another bad compound"), first = TRUE)) failed <- attributes(result)$failed # get the list of failed inputs names(failed) # get the error message for the failed input print(failed[1])
Once CIDs are obtained, they can be used to query the properties of the compound.
To view the available properties from Pubchem, use the getPubchemProperties
function.
``` {r get pubchem properties} getPubchemProperties()
After deciding which properties to query, you can use the `mapCID2Properties` function to get the properties for a specific CID. ``` {r get properties for a single cid} properties <- c("Title", "MolecularFormula", "InChIKey", "MolecularWeight") # Need to remove NA values from the query as they will cause an error result[!is.na(cids), mapCID2Properties(ids = cids, properties = properties)]
Pubchem's VIEW API provides access to annotations from external sources such as UniProt, ChEBI, and ChEMBL, given a specific identifier. Before querying annotations, we need to use the exact heading we want to query.
You can use the getPubchemAnnotationHeadings
function to get the available
annotation headings and types.
``` {r get annotation headings} getPubchemAnnotationHeadings()
### Get annotation headings for a specific type: ``` {r get annotation headings for a specific type} getPubchemAnnotationHeadings(type = "Compound")
``` {r get annotation headings for a specific heading} getPubchemAnnotationHeadings(heading = "ChEMBL ID")
### Get annotation headings for a specific type **and** heading: ``` {r get annotation headings for a specific type and heading} getPubchemAnnotationHeadings(type = "Compound", heading = "CAS")
We can then use the heading to query the annotations for a specific CID.
{r get annotations for a single cid}
result[!is.na(cids), CAS := annotatePubchemCompound(cids, "CAS")]
result
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.