title: "pogos -- PharmacOGenomics Ontology Support"
author: "Vincent J. Carey, stvjc at channing.harvard.edu"
date: "r format(Sys.time(), '%B %d, %Y')
"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{pogos -- simple interface to bhklab PharmacoDB with emphasis on ontology}
%\VignetteEncoding{UTF-8}
output:
BiocStyle::html_document:
highlight: pygments
number_sections: yes
theme: united
toc_float: yes
bhklab's PharmacoDB unites multiple drug sensitivity databases. A REST API is available and interfaces to some basic endpoints are provided in this package. We also provide some basic support for working with the Cell Line Ontology and ChEBI.
A basic purpose of this package is to support exploration of the completeness and utility of available ontologies for cell lines and drugs. A relevant study of the possibility of "drug set enrichment analysis" has been published by Napolitano and colleagues.
We note that detailed RESTful interrogation of many ontologies
is supported by the r Biocpkg("rols")
package. In the long
run we would like to make use of the EBI Ontology Lookup System
as a central resource of this package, but to develop concepts
at these early stages, static images of ontologies are employed.
```{r setup,echo=FALSE} suppressMessages({ suppressPackageStartupMessages({ library(pogos) library(S4Vectors) library(DT) library(ggplot2) }) })
# Basic software design for pogos
This package attacks two problems: organization of data on
cell-line response to treatments by diverse compounds, and
communication with PharmacoDb.
## Data structures for pharmacogenomics
This package provides two key data structures for managing pharmacogenomic
experimental results
- `DRTraceSet`: for a given dosage schedule for a given compound, manage the
dose-response profiles for a number of cell lines.
An example is
```{r lkcc}
iric = iriCCLE()
iric
DRProfSet
: for a given cell line, provide the dose-response profiles for
a number of compounds. Example:
```{r lkdrp}
drp = DRProfSet()
drp
The `plot` method reveals a basic motivation for `DRTraceSet`.
```{r lkpl090, fig.height=2.5}
plot(iric)
The plot method for DRProfSet has a complementary purpose. ```{r lklll,fig.height=3.5} plot(drp)
## Interrogating PharmacoDb
The communications part of the package is somewhat
fragile, as changes to the upstream service at
Princess Margaret can lead to failures here.
The heart of the query facility is shown here,
unevaluated. The `datasetCodes` and `cellLineCodes`
are extracted from reference data that may itself
go out of date.
xx = GET(sprintf("https://pharmacodb.pmgenomics.ca/api/v1/intersections/2/%d/%d?indent=true",
cellLineCodes[cell_line], datasetCodes[dataset]))
ans = fromJSON(readBin(xx$content, what = "character"))
ans = lapply(ans, function(comb) {
new("DRProfile", cell_line = cell_line, dataset = dataset,
drug = comb$compound$name, drug_code = comb$compound$id,
cell_line_code = comb$cell_line$id, doses = vapply(comb$dose_responses,
function(x) x$dose, numeric(1)), responses = vapply(comb$dose_responses,
function(x) x$response, numeric(1)))
})
# Some stored reference data
We precomputed tables for cell lines, compounds, and tissues.
These are available as DataFrame instances as
`cell_lines_v1`, `compounds_v1`, `datasets_v1`, and `tissues_v1`, in
the data element of the package.
Numerical ids are defined by the PharmacoDB.
## Cell lines
```{r lkk, echo=FALSE}
data(cell_lines_v1)
datatable(as.data.frame(cell_lines_v1), options=list(lengthMenu=c(3,5,10,50,100)))
```{r lkccc, echo=FALSE} data(compounds_v1) datatable(as.data.frame(compounds_v1), options=list(lengthMenu=c(3,5,10,50,100)))
## Datasets
```{r lkccc2, echo=FALSE}
data(datasets_v1)
datatable(as.data.frame(datasets_v1))
As an illustration of long-term intention of this package, consider the following snapshot of a slice of ChEBI.
```{r doch,fig=TRUE} library(ontoProc) cc = getChebiOnto() library(ontologyPlot) onto_plot(cc, c("CHEBI:134109", "CHEBI:61115", "CHEBI:75603", "CHEBI:37699", "CHEBI:68481", "CHEBI:71178"), fontsize=24, shape="rect")
It is natural to map the compounds studied in pharmacogenomics to these
compound classes as a step towards understanding common and distinct
mechanisms of action. But in the following display, we see
that some common chemotherapeutic agents are characterized in very
generic terms:
```{r lkmo, fig=TRUE}
onto_plot(cc, c("CHEBI:134109", "CHEBI:61115", "CHEBI:75603", "CHEBI:37699",
"CHEBI:68481", "CHEBI:71178", "CHEBI:52172", "CHEBI:64310",
"CHEBI:63632", "CHEBI:45863"), fontsize=28, shape="rect")
Should paclitaxel be identified as an "organic molecular entity" with no additional detail? Is there additional information in ChEBI or related ontologies to help organize information on related compounds?
Cell line nomenclature is often unnatural. The Cell Line Ontology should be helpful in systematizing cell lines used in pharmacogenomics. Nevertheless there is some apparent lack of consistency.
```{r ana,fig=TRUE} clo = getCellLineOnto() smoc = c("CLO:0000517", "CLO:0000560", "CLO:0000563", "CLO:0001072", "CLO:0001088", "CLO:0001138", "CLO:0007606") onto_plot(clo, smoc)
The MCF7 line should probably be annotated as related to human
mammary tissue.
# Working with specific elements
We want to maximize utility of institutionally curated
terminology for components of pharmacodb.
## Cell line identifiers
### Bridging to Cell Line Ontology
The following identifiers are available in the PharmacoDB cell lines:
c("143B", "1321N1", "184B5")
What are they? What organs do they represent? One way
to proceed is through parent terms in Cell Line Ontology.
```{r lkoo}
library(ontoProc)
clo = getCellLineOnto()
minds = which(clo$name %in% c("143B cell", "1321N1 cell", "184B5 cell"))
tags = clo$id[minds]
clo$name[ unlist(clo$parent[tags]) ]
TODO: how can we connect the anatomic vocabulary in these terms to formal anatomical terms in UBERON or CARO?
The Experimental Factor Ontology unifies various ontologies and
includes terms related to anatomic systems. At least with the
infrastructure available in r CRANpkg("ontologyIndex")
and
r CRANpkg("ontologyPlot")
, it seems there are opportunities to
add meaningful connections.
```{r lkefo, fig=TRUE} ee = getEFOOnto() onto_plot(ee, c("UBERON:0000474", "UBERON:0000079", "UBERON:0000990", "UBERON:0000995", "UBERON:0003982", "UBERON:0000310", "UBERON:0002107", "UBERON:0001264", "UBERON:0002113", "UBERON:0001007", "UBERON:0002530", "UBERON:6007435", "UBERON:0004122", "UBERON:0001008"))
It is likely important to remember the distinction between an
ontology, a structured collection of concept terms, and a
curated vocabulary or ontology annotation, which covers
concept "instances".
## Compounds
### Bridging to ChEBI
This will involve more work. We have the compounds in PharmacoDB.
How many match the compound names in Chebi?
```{r lkcom}
chl = getChebiOnto()
allch = chl$name
mm = allch[match(tolower(compounds_v1[,2]), tolower(allch), nomatch=0)]
round(length(mm)/nrow(compounds_v1),2) # not high
datatable(data.frame(id=names(mm), comp=as.character(mm)))
Direct matches are relatively rare. Attempts to use approximate matching with agrep have so far been unproductive. As an example, which we do not run:
```{r srchsym, eval=FALSE} notin = setdiff(compounds_v1$name, mm) library(parallel) options(mc.cores=2) allch = tolower(allch) notin = tolower(notin) lk50 = mclapply(notin[1:50], function(x) { cat(x); order(adist(x, allch))[1:5]}) names(lk50) = notin[1:50] aa = do.call(cbind, lk50[1:50]) ttt = t(apply(aa,2,function(x) allch[x])[1:3,])
# Retrieving dose-response data for cell-line/compound intersections
Given a cell-line code and a database, we can retrieve all the experiments
associated with it in that database.
CCLE for MCF-7:
```{r lkcomb,cache=TRUE}
library(rjson)
library(httr)
xx = GET("https://pharmacodb.pmgenomics.ca/api/v1/intersections/2/895/1?indent=true")
ans = fromJSON(readBin(xx$content, what="character"))
doses1 = sapply(ans[[1]]$dose_responses, function(x) x[[1]])
resp1 = sapply(ans[[1]]$dose_responses, function(x) x[[2]])
The compoundsByCell
function employs this pattern to establish
lists of experiments per cell type.
We can recreate (approximately: the dose and response units need to be clarified) Figure 4b from Barretina et al. 2012 as follows (defaults to DRTraceSet are specifically chosen to compute this figure):
{r lkbar, fig=TRUE}
library(ggplot2)
plot(DRTraceSet(dataset="CCLE"))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.