suppressPackageStartupMessages({ library(oc2bioc) library(curatedTCGAData) })
We propose a bidirectional interface between Bioconductor (@Huber2015) and OpenCRAVAT (@Pagel2020).
We aim for an interactively searchable table along the lines of
We use the open-cravat
python modules (installed with oc2bioc,
via the basilisk protocol) to obtain the listing of all modules.
This listing is managed in an S4 class instance:
library(oc2bioc) ms = populate_module_set() ms
An as.data.frame method is available.
dim(as.data.frame(ms))
Use DT::datatable
with the data.frame instance to obtain the
searchable table depicted above.
The queryOC
function authenticates to the OpenCRAVAT project's server
and uses the RESTful API to resolve queries. If username and
password are not supplied, environment variables are queried
to obtain these.
Here's an illustration of how this works. This chunk will not run unless an OpenCRAVAT username and password are supplied as the first two arguments.
var_in_tx = queryOC([uname], [passwd], chr="chr7", pos="140753336", ref="A", alt="T", annotator=c("chasmplus_BRCA", "pubmed", "segway_breast"))
The response to this request (made on 6 Sept 2020 and saved) is:
var_in_tx names(vv <- httr::content(var_in_tx))
The ChasmPlus result looks like
vv$chasmplus_BRCA
Another illustration with a variant in a long non-coding RNA (@Suvanto2020) is:
> nonco_var = queryOC(,[username], [passwd], chr="chr15", pos="50394581", ref="G", alt="C", + annotator=c("chasmplus_BRCA", "pubmed", "segway_breast", + "dbsnp", "ncrna", "gtex", "phdsnpg")) > nonco_var Response [https://run.opencravat.org/submit/annotate?chrom=chr15&pos=50394581&ref_base=G&alt_base=C&annotators=chasmplus_BRCA,pubmed,segway_breast,dbsnp,ncrna,gtex,phdsnpg] Date: 2020-10-09 19:12 Status: 200 Content-Type: application/json; charset=utf-8 Size: 416 B > library(httr) 0/0 packages newly attached/loaded, see sessionInfo() for details. > names(content(nonco_var)) [1] "ncrna" "pubmed" "gtex" "segway_breast" [5] "chasmplus_BRCA" "dbsnp" "phdsnpg" "crx"
There is no novel content in the fields, except the position is noted as Quiescent
in the
segway_breast
resources, and the dbSNP id (rs28489579) is returned.
When a local deployment is running, for example via oc gui
,
queryOC
can be used with appropriate settings of baseURL
to direct queries to locally installed annotators.
We'll use the adrenocortical carcinoma data from TCGA to
obtain a set of variants. These are available as a GRanges
gr38
in the oc2bioc package. The codes to generate this are
presented without evaluation here:
library(curatedTCGAData) suppressMessages({ acc = curatedTCGAData("ACC", "Mutation", dry.run=FALSE, verbose=FALSE) }) mut = experiments(acc)[[1]] mut
The "assays" available record much information about the specific variants; we only need REF/ALT codes.
assayNames(mut)[c(1,7:9)]
The addresses of the variants are available after coercing
the RaggedArray
instance to a GRangesList
; this is simplified
to a GRanges
with unlist
.
gr = unlist(as(mut, "GRangesList")) dim(mcols(gr)) head(gr[,1:3],3)
Multibase variants are present. We'll confine attention to single-nucleotide variants (SNV).
table(width(gr))[1:6] gr = gr[width(gr)==1]
We'll want the variants in GRCh38 coordinates. ```r seqlevels(gr) = paste0("chr", seqlevels(gr)) gr38 = unlist(rtracklayer::liftOver(gr, oc2bioc::ch19to38)) genome(gr38) = "hg38" # UCSC terminology head(gr38[,1:3],3) length(gr38) length(unique(gr38))
mtb = make_oc_POSTable(gr38) # already remapped head(mtb)
Use code like
write.table(mtb, file="/tmp/abc.txt", sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE)
to create a file that can be ingested by OpenCRAVAT.
Once the POSTable table has been written to a file, call run_oc_req
with a url
appropriate to your system, as in the following:
myrun = run_oc_req(url="http://0.0.0.0:8080/submit/submit", postfile="/tmp/abc.txt", annotators=c("clinvar", "segway_lung"), reports=c("text", "vcf"), assembly="hg38", note="test run")
The object myrun
will be an R representation of a python dictionary
with element r
. This can be interrogated as myrun$r$json()
to
get information about the run.
> str(myrun$r$json()) List of 14 $ orig_input_fname : chr "abc.txt" $ assembly : chr "hg38" $ note : chr "test run" $ db_path : chr "" $ viewable : logi FALSE $ reports : chr [1:2] "text" "vcf" $ annotators : chr [1:2] "clinvar" "segway_lung" $ annotator_version : chr "" $ open_cravat_version: chr "" $ num_input_var : chr "" $ submission_time : chr "2020-09-25T14:15:20.625643" $ id : chr "200925-141520" $ run_name : chr "abc.txt" $ status :List of 1 ..$ status: chr "Submitted"
Of crucial significance here is the id
component. This will be
used to find the annotations computed by OpenCRAVAT.
An example annotation result is shipped with the package
as crx_demo
.
data(crx_demo) names(crx_demo) nonsyn = which(crx_demo[,10]!="SYN") head(crx_demo[nonsyn,c(2,3,8:10)])
Local deployments of OpenCRAVAT will have SQLite databases
corresponding to the installed annotators. The oc2bioc
package includes utilities for identifying and
querying these.
Here's an example. Segway's lung data is actually all based on fetal lung tissue.
> list_local_annotators() [1] "clinvar" "genehancer" "segway_lung" "segway_muscle" > lcon = connect_local_annotator("segway_lung") > dbListTables(lcon) [1] "chr1" "chr10" [3] "chr11" "chr12" [5] "chr12_GL877875v1_alt" "chr13" ... > lcon %>% tbl("chr1") %>% glimpse() Rows: ?? Columns: 5 Database: sqlite 3.30.1 [/home/stvjc/.local/lib/python3.8/site-packages/cravat/modules/annotators/segway_lung/data/segway_lung.sqlite] $ bin <int> 585, 585, 585, 585, 585, 585, 585, 585, 585, 585, 585, 585, 58… $ start <int> 11000, 13000, 13500, 14800, 15100, 54400, 54700, 55700, 79100,… $ end <int> 13000, 13500, 14800, 15100, 16100, 54700, 55700, 56000, 79300,… $ tissue <chr> "FETAL_LUNG", "FETAL_LUNG", "FETAL_LUNG", "FETAL_LUNG", "FETAL… $ state <chr> "11_Quiescent", "9_Quiescent", "11_Quiescent", "9_Quiescent", …
Sequence Ontology (SO) is used to label and group variants by structural
or functional type. We noted abbreviations for SO terms in the crx_demo
file and did not see a conventional mapping for these. The mapping was
extracted from JSON found in cravat_util.py
.
head(SO_map)
To help interpret the SO terms, we include a snapshot of the SO that retains the graphical structure of relationships among terms. A view of variant-related concepts can be plotted using tools in the ontologyIndex and ontoProc packages.
ontoProc::onto_plot2(SO_onto, na.omit(SO_map[,3])[39:49])
Not all the terms used in OpenCRAVAT are in the display above, and most of the terms in the display are not explicitly used in OpenCRAVAT. The display is here to orient readers to the SO concepts.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.