In bhklab/AnnotationGx: AnnotationGx: A package for building, updating and querying an annotation database for pharmaco-genomic data

Introduction

This vignette compares annotating CTRP-provided treatment ids to PubChem CIDs and CTD information.

Whereas the PubChem CID is a unique identifier for a compound, the PubChem API does not easily map treatment names to CIDs, atleast not in a way that easy for commonly misnamed treatments. Specifically, for the CTRP treatment names (n=545), the PubChem API does not correctly map all of them to PubChem CIDs. <!-- NOTE: As of March 27, 2025, the CTD2 database is not available. The API is not available. The CTD2 database is the central database where CTRP data is hosted. They happen to expose an API for their database.

Developer Note: The API calls they describe on their API documentation is useful, but they have an endpoint: GET /compound/{compoundId} that is not documented. This endpoint is useful for mapping compound names in the way their data (i.e CTRP) names them to PubChem CIDs.

The functionality for this is implemented in the mapCompound2CTD function. -->

It is an investigation to see which of the methods might map more compounds

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(AnnotationGx)

data(CTRP_treatmentMetadata)

``` {r test_both}

get a random row from the CTRP_treatmentMetadata

treatment <- CTRP_treatmentMetadata[1, CTRP.treatmentid] sprintf("CTRP treatment id : %s", treatment)

map the treatment to a CID using PubChem

mapCompound2CID(treatment)

## Annotating using PubChem

``` {r run_CTRP_Pubchem, eval = FALSE}
(compounds_to_cids <- 
  CTRP_treatmentMetadata[1:10, 
    AnnotationGx::mapCompound2CID(
        names =  CTRP.treatmentid,
        first = TRUE
        )
      ]
)
failed <- 
  attributes(compounds_to_cids)$failed |> 
    names()

``` {r Pubchem Failed, eval = FALSE} failed <- unique(CTRP_treatmentMetadata[CTRP.treatmentid %in% failed, ])

failed[, CTRP.treatmentid_CLEANED := cleanCharacterStrings(CTRP.treatmentid)]

(failed_to_cids <- failed[, AnnotationGx::mapCompound2CID( names = CTRP.treatmentid_CLEANED, first = TRUE ) ] ) failed_again <- attributes(failed_to_cids)$failed |> names()

``` {r pubchemfailed again, eval = FALSE}
failed_dt <- merge(failed_to_cids[!is.na(cids),], failed, by.x = "name", by.y = "CTRP.treatmentid_CLEANED", all.x = F)
failed_dt$name <- NULL

successful_dt <- merge(CTRP_treatmentMetadata, compounds_to_cids[!is.na(cids),],by.x = "CTRP.treatmentid", by.y = "name",  all.x = F)

mapped_PubChem <- data.table::rbindlist(list(successful_dt, failed_dt), use.names = T, fill = T)

bhklab/AnnotationGx documentation built on April 3, 2025, 4:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bhklab/AnnotationGx
AnnotationGx: A package for building, updating and querying an annotation database for pharmaco-genomic data

In bhklab/AnnotationGx: AnnotationGx: A package for building, updating and querying an annotation database for pharmaco-genomic data

Introduction

get a random row from the CTRP_treatmentMetadata

map the treatment to a CID using PubChem

R Package Documentation

Browse R Packages

We want your feedback!

bhklab/AnnotationGx AnnotationGx: A package for building, updating and querying an annotation database for pharmaco-genomic data

In bhklab/AnnotationGx: AnnotationGx: A package for building, updating and querying an annotation database for pharmaco-genomic data

Introduction

get a random row from the CTRP_treatmentMetadata

map the treatment to a CID using PubChem

R Package Documentation

Browse R Packages

We want your feedback!

bhklab/AnnotationGx
AnnotationGx: A package for building, updating and querying an annotation database for pharmaco-genomic data