suppressPackageStartupMessages({ library(oc2bioc) library(curatedTCGAData) })
We propose a bidirectional interface between Bioconductor (@Huber2015) and OpenCRAVAT (@Pagel2020).
We aim for an interactively searchable table along the lines of

We use the open-cravat python modules (installed with oc2bioc,
via the basilisk protocol) to obtain the listing of all modules.
This listing is managed in an S4 class instance:
library(oc2bioc) ms = populate_module_set() ms
An as.data.frame method is available.
dim(as.data.frame(ms))
Use DT::datatable with the data.frame instance to obtain the
searchable table depicted above.
The queryOC function authenticates to the OpenCRAVAT project's server
and uses the RESTful API to resolve queries. If username and
password are not supplied, environment variables are queried
to obtain these.
Here's an illustration of how this works. This chunk will not run unless an OpenCRAVAT username and password are supplied as the first two arguments.
var_in_tx = queryOC([uname], [passwd], chr="chr7", pos="140753336", ref="A", alt="T", annotator=c("chasmplus_BRCA", "pubmed", "segway_breast"))
The response to this request (made on 6 Sept 2020 and saved) is:
var_in_tx names(vv <- httr::content(var_in_tx))
The ChasmPlus result looks like
vv$chasmplus_BRCA
Another illustration with a variant in a long non-coding RNA (@Suvanto2020) is:
> nonco_var = queryOC(,[username], [passwd], chr="chr15", pos="50394581", ref="G", alt="C", + annotator=c("chasmplus_BRCA", "pubmed", "segway_breast", + "dbsnp", "ncrna", "gtex", "phdsnpg")) > nonco_var Response [https://run.opencravat.org/submit/annotate?chrom=chr15&pos=50394581&ref_base=G&alt_base=C&annotators=chasmplus_BRCA,pubmed,segway_breast,dbsnp,ncrna,gtex,phdsnpg] Date: 2020-10-09 19:12 Status: 200 Content-Type: application/json; charset=utf-8 Size: 416 B > library(httr) 0/0 packages newly attached/loaded, see sessionInfo() for details. > names(content(nonco_var)) [1] "ncrna" "pubmed" "gtex" "segway_breast" [5] "chasmplus_BRCA" "dbsnp" "phdsnpg" "crx"
There is no novel content in the fields, except the position is noted as Quiescent in the
segway_breast resources, and the dbSNP id (rs28489579) is returned.
When a local deployment is running, for example via oc gui,
queryOC can be used with appropriate settings of baseURL
to direct queries to locally installed annotators.
We'll use the adrenocortical carcinoma data from TCGA to
obtain a set of variants. These are available as a GRanges
gr38 in the oc2bioc package. The codes to generate this are
presented without evaluation here:
library(curatedTCGAData) suppressMessages({ acc = curatedTCGAData("ACC", "Mutation", dry.run=FALSE, verbose=FALSE) }) mut = experiments(acc)[[1]] mut
The "assays" available record much information about the specific variants; we only need REF/ALT codes.
assayNames(mut)[c(1,7:9)]
The addresses of the variants are available after coercing
the RaggedArray instance to a GRangesList; this is simplified
to a GRanges with unlist.
gr = unlist(as(mut, "GRangesList")) dim(mcols(gr)) head(gr[,1:3],3)
Multibase variants are present. We'll confine attention to single-nucleotide variants (SNV).
table(width(gr))[1:6] gr = gr[width(gr)==1]
We'll want the variants in GRCh38 coordinates.
```r
seqlevels(gr) = paste0("chr", seqlevels(gr))
gr38 = unlist(rtracklayer::liftOver(gr, oc2bioc::ch19to38))
genome(gr38) = "hg38" # UCSC terminology
head(gr38[,1:3],3)
length(gr38)
length(unique(gr38))
mtb = make_oc_POSTable(gr38) # already remapped head(mtb)
Use code like
write.table(mtb, file="/tmp/abc.txt", sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE)
to create a file that can be ingested by OpenCRAVAT.
Once the POSTable table has been written to a file, call run_oc_req
with a url appropriate to your system, as in the following:
myrun = run_oc_req(url="http://0.0.0.0:8080/submit/submit",
postfile="/tmp/abc.txt", annotators=c("clinvar", "segway_lung"),
reports=c("text", "vcf"), assembly="hg38", note="test run")
The object myrun will be an R representation of a python dictionary
with element r. This can be interrogated as myrun$r$json() to
get information about the run.
> str(myrun$r$json()) List of 14 $ orig_input_fname : chr "abc.txt" $ assembly : chr "hg38" $ note : chr "test run" $ db_path : chr "" $ viewable : logi FALSE $ reports : chr [1:2] "text" "vcf" $ annotators : chr [1:2] "clinvar" "segway_lung" $ annotator_version : chr "" $ open_cravat_version: chr "" $ num_input_var : chr "" $ submission_time : chr "2020-09-25T14:15:20.625643" $ id : chr "200925-141520" $ run_name : chr "abc.txt" $ status :List of 1 ..$ status: chr "Submitted"
Of crucial significance here is the id component. This will be
used to find the annotations computed by OpenCRAVAT.
An example annotation result is shipped with the package
as crx_demo.
data(crx_demo) names(crx_demo) nonsyn = which(crx_demo[,10]!="SYN") head(crx_demo[nonsyn,c(2,3,8:10)])
Local deployments of OpenCRAVAT will have SQLite databases
corresponding to the installed annotators. The oc2bioc
package includes utilities for identifying and
querying these.
Here's an example. Segway's lung data is actually all based on fetal lung tissue.
> list_local_annotators()
[1] "clinvar" "genehancer" "segway_lung" "segway_muscle"
> lcon = connect_local_annotator("segway_lung")
> dbListTables(lcon)
[1] "chr1" "chr10"
[3] "chr11" "chr12"
[5] "chr12_GL877875v1_alt" "chr13"
...
> lcon %>% tbl("chr1") %>% glimpse()
Rows: ??
Columns: 5
Database: sqlite 3.30.1 [/home/stvjc/.local/lib/python3.8/site-packages/cravat/modules/annotators/segway_lung/data/segway_lung.sqlite]
$ bin <int> 585, 585, 585, 585, 585, 585, 585, 585, 585, 585, 585, 585, 58…
$ start <int> 11000, 13000, 13500, 14800, 15100, 54400, 54700, 55700, 79100,…
$ end <int> 13000, 13500, 14800, 15100, 16100, 54700, 55700, 56000, 79300,…
$ tissue <chr> "FETAL_LUNG", "FETAL_LUNG", "FETAL_LUNG", "FETAL_LUNG", "FETAL…
$ state <chr> "11_Quiescent", "9_Quiescent", "11_Quiescent", "9_Quiescent", …
Sequence Ontology (SO) is used to label and group variants by structural
or functional type. We noted abbreviations for SO terms in the crx_demo
file and did not see a conventional mapping for these. The mapping was
extracted from JSON found in cravat_util.py.
head(SO_map)
To help interpret the SO terms, we include a snapshot of the SO that retains the graphical structure of relationships among terms. A view of variant-related concepts can be plotted using tools in the ontologyIndex and ontoProc packages.
ontoProc::onto_plot2(SO_onto, na.omit(SO_map[,3])[39:49])
Not all the terms used in OpenCRAVAT are in the display above, and most of the terms in the display are not explicitly used in OpenCRAVAT. The display is here to orient readers to the SO concepts.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.