knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction to PubChem APIs

PubChem is a database of chemical molecules and their biological activities. It is a part of the National Center for Biotechnology Information (NCBI), which is a part of the National Institutes of Health (NIH). PubChem provides a set of APIs to query its database. The AnnotationGx package provides a set of functions to query PubChem using these APIs.

The first of these APIs is the PubChem PUG REST API which is designed to - make specific queries based on some input identifier and return data which PubChem has labelled or computed internally [1]. - This API is useful for querying information about a specific chemical compound such as getting the standardized PubChem identifier (CID) for a given chemical name or smiles string, or getting the chemical structure for a given CID. - It provides access to a wide range of data including chemical properties, bioassay data, and chemical classification data, given a specific identifier.

The second API is the PubChem PUG VIEW API which is designed to: - give accesse to aggregated annotations for a given chemical compound [3] that is mapped to their data, but not curated by PubChem itself. - i.e it provides access to annotations from external sources such as UniProt, ChEBI, and ChEMBL, given a specific identifier.

Setup

library(AnnotationGx)

Mapping from chemical name to PubChem CID

The main function that is provided by the package is mapCompound2CID.

``` {r map aspirin to cid} mapCompound2CID("aspirin")

You can pass in a list of compound names to get the CIDs for all of them at once.

``` {r map multiple compounds to cid}
drugs <- c(
  "Aspirin", "Erlotinib", "Acadesine", "Camptothecin", "Vincaleukoblastine",
  "Cisplatin"
)

mapCompound2CID(drugs)

It is possible for names to multimap to CIDs. This is the case for 'Vincaleukoblastine' in the above query. In cases of multimapping, usually the first entry has the highest similarity to the requested drug. To subset to only the first occurrence of each of drug name use the first = TRUE argument:

``` {r map multiple compounds to cid and subset to first} mapCompound2CID(drugs, first = TRUE)

In the case of a compound that can't be mapped, `NA` will be returned and a warning will be issued.

``` {r map non existent compound to cid}
(result <- mapCompound2CID(c(drugs, "non existent compound", "another bad compound"), first = TRUE))

failed <- attributes(result)$failed

# get the list of failed inputs
names(failed)

# get the error message for the failed input
print(failed[1])

Mapping from PubChem CID to Properties

Once CIDs are obtained, they can be used to query the properties of the compound. To view the available properties from Pubchem, use the getPubchemProperties function.

``` {r get pubchem properties} getPubchemProperties()

After deciding which properties to query, you can use the `mapCID2Properties` function to get the properties for a specific CID.

``` {r get properties for a single cid}
properties <- c("Title", "MolecularFormula", "InChIKey", "MolecularWeight")

# Need to remove NA values from the query as they will cause an error
result[!is.na(cids), mapCID2Properties(ids = cids, properties = properties)]

Mapping from PubChem CID to Annotations

Pubchem's VIEW API provides access to annotations from external sources such as UniProt, ChEBI, and ChEMBL, given a specific identifier. Before querying annotations, we need to use the exact heading we want to query.

You can use the getPubchemAnnotationHeadings function to get the available annotation headings and types.

Get ALL available annotation headings:

``` {r get annotation headings} getPubchemAnnotationHeadings()

### Get annotation headings for a specific type:
``` {r get annotation headings for a specific type}
getPubchemAnnotationHeadings(type = "Compound")

Get annotation headings for a specific heading:

``` {r get annotation headings for a specific heading} getPubchemAnnotationHeadings(heading = "ChEMBL ID")

### Get annotation headings for a specific type **and** heading:
``` {r get annotation headings for a specific type and heading}
getPubchemAnnotationHeadings(type = "Compound", heading = "CAS")

Query annotations for a specific CID and heading

We can then use the heading to query the annotations for a specific CID.

{r get annotations for a single cid} result[!is.na(cids), CAS := annotatePubchemCompound(cids, "CAS")] result

References

  1. PUG REST. PubChem Docs [website]. Retrieved from https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest.
  2. Kim S, Thiessen PA, Cheng T, Yu B, Bolton EE. An update on PUG-REST: RESTful interface for programmatic access to PubChem. Nucleic Acids Res. 2018 July 2; 46(W1):W563-570. doi:10.1093/nar/gky294.
  3. PUG VIEW. PubChem Docs [webiste]. Retrieved from https://pubchemdocs.ncbi.nlm.nih.gov/pug-view.
  4. Kim S, Thiessen PA, Cheng T, Zhang J, Gindulyte A, Bolton EE. PUG-View: programmatic access to chemical annotations integrated in PubChem. J Cheminform. 2019 Aug 9; 11:56. doi:10.1186/s13321-019-0375-2.


bhklab/AnnotationGx documentation built on April 3, 2025, 4:27 p.m.