BiocStyle::markdown()
# Ensure that any errors cause the Vignette build to fail. library(knitr) opts_chunk$set(error=FALSE)
apiKey <- Sys.getenv("GOOGLE_API_KEY") if (nchar(apiKey) == 0) { warning(paste("To build this vignette, please setup the environment variable", "GOOGLE_API_KEY with the public API key from your Google", "Developer Console before loading the GoogleGenomics package,", "or run GoogleGenomics::authenticate.")) knitr::knit_exit() }
Google Genomics implements the GA4GH variants API and this R package can retrieve data from that implementation. For more detail, see https://cloud.google.com/genomics/v1beta2/reference/variants
library(GoogleGenomics) # This vignette is authenticated on package load from the env variable GOOGLE_API_KEY. # When running interactively, call the authenticate method. # ?authenticate
By default, this function retrieves variants for a small genomic region from the 1,000 Genomes phase 1 variants.
variants <- getVariants() length(variants)
We can see that r length(variants)
individual variants were returned and that the JSON response was deserialized into an R list object:
class(variants) mode(variants)
The top level field names are:
names(variants[[1]])
The variant contains nested calls:
length(variants[[1]]$calls)
With top level call field names:
names(variants[[1]]$calls[[1]])
And examining a call for a particular variant:
variants[[1]]$referenceName variants[[1]]$start variants[[1]]$alternateBases variants[[1]]$calls[[1]]
This is good, but this data becomes much more useful when it is converted to Bioconductor data types. For example, we can convert variants in this list representation to VRanges from r Biocpkg("VariantAnnotation")
:
variantsToVRanges(variants)
Next let's use package r Biocpkg("VariantAnnotation")
to annotate some specific 1,000 Genomes Phase 1 variants.
library(VariantAnnotation)
Note that the parameters start
and end
are expressed in 0-based coordinates per the GA4GH specification but the Bioconductor data type converters in r Biocpkg("GoogleGenomics")
, by default, transform the returned data to 1-based coordinates.
granges <- getVariants(variantSetId="10473108253681171589", chromosome="22", start=50300077, end=50303000, converter=variantsToGRanges)
Now locate the protein coding variants in each:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene granges_locations <- locateVariants(granges, txdb, CodingVariants()) granges_locations
And predict the effect of the protein coding variants:
library(BSgenome.Hsapiens.UCSC.hg19)
granges_coding <- predictCoding(rep(granges, elementNROWS(granges$ALT)), txdb, seqSource=Hsapiens, varAllele=unlist(granges$ALT, use.names=FALSE)) granges_coding
Finally, add gene information:
library(org.Hs.eg.db)
annots <- select(org.Hs.eg.db, keys=granges_coding$GENEID, keytype="ENTREZID", columns=c("SYMBOL", "GENENAME","ENSEMBL")) cbind(elementMetadata(granges_coding), annots)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.