Plotting Alignments

BiocStyle::markdown()
# Ensure that any errors cause the Vignette build to fail.
library(knitr)
opts_chunk$set(error=FALSE)
apiKey <- Sys.getenv("GOOGLE_API_KEY")
if (nchar(apiKey) == 0) {
  warning(paste("To build this vignette, please setup the environment variable",
                "GOOGLE_API_KEY with the public API key from your Google",
                "Developer Console before loading the GoogleGenomics package,",
                "or run GoogleGenomics::authenticate."))
  knitr::knit_exit()
}

Working with Reads

Google Genomics implements the GA4GH reads API and this R package can retrieve data from that implementation. For more detail, see https://cloud.google.com/genomics/v1beta2/reference/reads

library(GoogleGenomics)
# This vignette is authenticated on package load from the env variable GOOGLE_API_KEY.
# When running interactively, call the authenticate method.
# ?authenticate

By default, this function retrieves reads for a small genomic region for one sample in 1,000 Genomes.

reads <- getReads()
length(reads)

We can see that r length(reads) individual reads were returned and that the JSON response was deserialized into an R list object:

class(reads)
mode(reads)

The top level field names are:

names(reads[[1]])

And examining the alignment we see:

reads[[1]]$alignedSequence
reads[[1]]$alignment$position$referenceName
reads[[1]]$alignment$position$position

This is good, but this data becomes much more useful when it is converted to Bioconductor data types. For example, we can convert reads in this list representation to r Biocpkg("GAlignments"):

readsToGAlignments(reads)

Plotting Alignments

Let's take a look at the reads that overlap rs9536314 for sample NA12893 within the Illumina Platinum Genomes dataset. This SNP resides on chromosome 13 at position 33628137 in 0-based coordinates.

# Change the values of 'chromosome', 'start', or 'end' here if you wish to plot 
# alignments from a different portion of the genome.
alignments <- getReads(readGroupSetId="CMvnhpKTFhD3he72j4KZuyc",
                       chromosome="chr13",
                       start=33628130,
                       end=33628145,
                       converter=readsToGAlignments)
alignments

Notice that we passed the converter to the getReads method so that we're immediately working with GAlignments which means that we can start taking advantage of other Bioconductor functionality. Also keep in mind that the parameters start and end are expressed in 0-based coordinates per the GA4GH specification but the Bioconductor data type converters in r Biocpkg("GoogleGenomics"), by default, transform the returned data to 1-based coordinates.

library(ggbio)

We can display the basic alignments and coverage data:

alignmentPlot <- autoplot(alignments, aes(color=strand, fill=strand))
coveragePlot <- ggplot(as(alignments, "GRanges")) +
                stat_coverage(color="gray40", fill="skyblue")
tracks(alignmentPlot, coveragePlot,
       xlab="Reads overlapping rs9536314 for NA12893")

And also display the ideogram for the corresponding location on the chromosome:

ideogramPlot <- plotIdeogram(genome="hg19", subchr="chr13")
ideogramPlot + xlim(as(alignments, "GRanges"))

Provenance

sessionInfo()


Try the GoogleGenomics package in your browser

Any scripts or data that you put into this service are public.

GoogleGenomics documentation built on May 2, 2019, 12:54 a.m.