knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)

Introduction to the Unichem API

The UniChem database provides a publicly available REST API for programmatic retrieval of mappings from standardized structural compound identifiers to unique compound IDs across a range of large online cheminformatic databases such as PubChem, ChEMBL, DrugBank and many more. The service accepts POST requests to two different end-points: /compound and /connectivity. Both endpoints accept query parameters via the POST body in JSON format. The /compound API returns exact matches for the queried compound, while the /connectivity API uses layers of the International Chemical Identifier (InChI) of the query compound to return exact matches as well as structurally related compounds such as isomers, salts, ionizations and more. [@UniChemBeta; @chambersUniChemUnifiedChemical2013]

The functions in AnnotationGx have been designed to allow package users to easily query UniChem resources without any pre-existing knowledge of HTTP requests or the API specifications. In doing so we hope to provide an R native interface for mapping between various cheminformatic databases, accessible to anyone familar with using R functions!

library(AnnotationGx)

Available Databases

To see a table of database identifiers available via UniChem, you can call the getUniChemSources function. By default, just the database shortname ("Name") and UniChem's ID for it ("SourceID") columns are returned. To return all columns, pass the all_columns = TRUE argument

getUnichemSources()

When mapping using the queryUnichemCompound function, these are the sources that can be used from, and the databases to which the compound mappings will be returned.

Querying UniChem Compound API

The queryUnichemCompound function allows you to query the UniChem Compound API to retrieve mappings for a given compound identifier. The function takes two mandatory arguments. The first is the compound argument which is the compound identifier to be queried. The second is the type argument which is the type of compound identifier to search for. Options are "uci", "inchi", "inchikey", and "sourceID". The sourceID argument is optional and is only required if the type argument is "sourceID".

The function returns a list of:

  1. "External_Mappings" data.table containing the mapping to other Databases with the following headings:
  2. "compoundID" character The compound identifier
  3. "Name" character The name of the database
  4. "NameLong" character The long name of the database
  5. "SourceID" character The UniChem Source ID
  6. "sourceURL" character The URL of the source
  7. "UniChem_Mappings" list of the following six mappings:
  8. "UCI" character The UniChem Identifier
  9. "InchiKey" character The InChIKey
  10. "Inchi" character The InChI
  11. "formula" character The molecular formula
  12. "connections" character connection representation "1-6(10)13-8-5-3-2-4-7(8)9(11)12"
  13. "hAtoms" character hydrogen atom connections "2-5H,1H3,(H,11,12)"

Example Searching using uci (UniChem Identifier)

Note: This type of query requires you to know the UniChem Identifier for the compound.

queryUnichemCompound(compound = "161671", type = "uci")


bhklab/AnnotationGx documentation built on April 3, 2025, 4:27 p.m.