library(BiocStyle) library(VariantAnnotation) library(jsonlite) library(httr)
Ensembl's Variant Effect Predictor is described in @McLaren2016.
Prior to Bioconductor 3.19, the ensemblVEP package provided access to Ensembl's predictions through an interface between Perl and MySQL.
In 3.19 VariantAnnotation supports the use of the VEP component of the REST API at https://rest.ensembl.org.
The function vep_by_region
will accept
a VCF object as defined in r Biocpkg("VariantAnnotation")
.
library(VariantAnnotation) fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") r22 = readVcf(fl) r22
In this example we confine attention to single nucleotide variants.
There is a limit of 200 locations in a request, and 55000 requests per hour. We'll base our query on 100 positions in the chr22 VCF.
dr = which(width(rowRanges(r22))!=1) r22s = r22[-dr] res = vep_by_region(r22[1:100], snv_only=FALSE, chk_max=FALSE) jans = toJSON(content(res))
There are various ways to work with the result of this query to the API.
We'll use the r CRANpkg('rjsoncons')
JSON processing infrastructure
to dig in and understand aspects of the API behavior.
First, the top-level concepts produced for each variant can be retrieved using
library(rjsoncons) names(jsonlite::fromJSON(jmespath(jans, "[*]")))
Annotation of the most severe consequence known will typically be of interest:
table(jsonlite::fromJSON(jmespath(jans, "[*].most_severe_consequence")))
There is variability in the structure of data returned for each query.
head(fromJSON(jmespath(jans, "[*].regulatory_feature_consequences")))
Furthermore, the content of the motif feature consequences field seems very peculiar.
table(unlist(fromJSON(jmespath(jans, "[*].motif_feature_consequences"))))
We'll consider the following approach to converting the API response to a GenomicRanges GRanges instance. Eventually this may become part of the package.
library(GenomicRanges) .make_GRanges = function( vep_response ) { stopifnot(inherits(vep_response, "response")) # httr nested = fromJSON(toJSON(content(vep_response))) ini = GRanges(seqnames = unlist(nested$seq_region_name), IRanges(start=unlist(nested$start), end=unlist(nested$end))) dr = match(c("seq_region_name", "start", "end"), names(nested)) mcols(ini) = DataFrame(nested[,-dr]) ini } tstg = .make_GRanges( res ) tstg[,1] # full print is unwieldy names(mcols(tstg))
Now information about variants can be retrieved with range operations. Deep annotation requires nested structure of the metadata columns.
mcols(tstg)[1, "transcript_consequences"]
An important element of prior work in ensemblVEP supports feeding annotation back into the VCF used to generate the effect prediction query. This seems feasible but concrete use cases are of interest.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.