Introduction

This vignette compares annotating CTRP-provided treatment ids to PubChem CIDs and CTD information.

Whereas the PubChem CID is a unique identifier for a compound, the PubChem API does not easily map treatment names to CIDs, atleast not in a way that easy for commonly misnamed treatments. Specifically, for the CTRP treatment names (n=545), the PubChem API does not correctly map all of them to PubChem CIDs. <!-- NOTE: As of March 27, 2025, the CTD2 database is not available. The API is not available. The CTD2 database is the central database where CTRP data is hosted. They happen to expose an API for their database.

Developer Note: The API calls they describe on their API documentation is useful, but they have an endpoint: GET /compound/{compoundId} that is not documented. This endpoint is useful for mapping compound names in the way their data (i.e CTRP) names them to PubChem CIDs.

The functionality for this is implemented in the mapCompound2CTD function. -->

It is an investigation to see which of the methods might map more compounds

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(AnnotationGx)

data(CTRP_treatmentMetadata)

``` {r test_both}

get a random row from the CTRP_treatmentMetadata

treatment <- CTRP_treatmentMetadata[1, CTRP.treatmentid] sprintf("CTRP treatment id : %s", treatment)

map the treatment to a CID using PubChem

mapCompound2CID(treatment)

## Annotating using PubChem

``` {r run_CTRP_Pubchem, eval = FALSE}
(compounds_to_cids <- 
  CTRP_treatmentMetadata[1:10, 
    AnnotationGx::mapCompound2CID(
        names =  CTRP.treatmentid,
        first = TRUE
        )
      ]
)
failed <- 
  attributes(compounds_to_cids)$failed |> 
    names()

``` {r Pubchem Failed, eval = FALSE} failed <- unique(CTRP_treatmentMetadata[CTRP.treatmentid %in% failed, ])

failed[, CTRP.treatmentid_CLEANED := cleanCharacterStrings(CTRP.treatmentid)]

(failed_to_cids <- failed[, AnnotationGx::mapCompound2CID( names = CTRP.treatmentid_CLEANED, first = TRUE ) ] ) failed_again <- attributes(failed_to_cids)$failed |> names()

``` {r pubchemfailed again, eval = FALSE}
failed_dt <- merge(failed_to_cids[!is.na(cids),], failed, by.x = "name", by.y = "CTRP.treatmentid_CLEANED", all.x = F)
failed_dt$name <- NULL

successful_dt <- merge(CTRP_treatmentMetadata, compounds_to_cids[!is.na(cids),],by.x = "CTRP.treatmentid", by.y = "name",  all.x = F)

mapped_PubChem <- data.table::rbindlist(list(successful_dt, failed_dt), use.names = T, fill = T)


bhklab/AnnotationGx documentation built on April 3, 2025, 4:27 p.m.