knitr::opts_chunk$set(fig.width=7)
knitr::opts_chunk$set(fig.align = 'center')

Chromo plots

The chromo() plot was created in order to visualize VCF data as plotted along a single chromosome. Depending on the state of the genome you're working on, a 'chromosome' may actually be a 'scaffold', 'supercontig' or simply a 'contig.' For the purposes of this discussion a 'chromosome' is simply a single sequence used as a reference in order to call variants.

The chromoqc() plot was introduced in other documentation. This plot is actually a wrapper to the chromo() plot. Because it is a wrapper it is a simplification of the chromo() plot. Also because it is a wrapper, it lacks some of the flexibility of the chromo() plot. Here we will take a closer look at the 'chromoqc' plot and then extend this knowledge to create the more customized chromo() plot.

Chromoqc

In other documentation we have rendered a chromoqc() plot. Here we reproduce this plot as a starting point for our discussion.

library(vcfR)

vcf_file <- system.file("extdata", "pinf_sc50.vcf.gz", package = "pinfsc50")
dna_file <- system.file("extdata", "pinf_sc50.fasta", package = "pinfsc50")
gff_file <- system.file("extdata", "pinf_sc50.gff", package = "pinfsc50")

vcf <- read.vcfR(vcf_file, verbose = FALSE)
dna <- ape::read.dna(dna_file, format = "fasta")
gff <- read.table(gff_file, sep="\t", quote="")

chrom <- create.chromR(name="Supercontig", vcf=vcf, seq=dna, ann=gff, verbose=FALSE)
chrom <- masker(chrom, min_DP = 300, max_DP = 700)
chrom <- proc.chromR(chrom, verbose = FALSE)
chromoqc(chrom)

This plot can be seen as seven horizontal plots. Five of these plots include marginal box and whisker plots. Two of these plots (at the bottom) lack marginal box and whisker plots and are half the height of the other plots. The organization of this plot was created with the function graphics::layout() which provides for multi-panel plots where each panel is not of the same dimensions. The manual page for this function provides details.

Each of these panels was created with the function dr.plot(). In order to learn how to make a chromo() plot we need to learn how to make each of the panels of a chromo() plot. And in order to understand how each of these panels are created, we need to understand the dr.plot() function.

dr.plot

The function dr.plot() creates plots of dots and rectangles. The dots are used to create dot plots along a chromosome. The rectangles are used to create bar charts along a chromosome. We'll begin by working with dot plots.

Dot plots

The dr.plot() function can accomodate a flexible number of data. For dot plots this is facilitated by using a numeric matrix to specify the coordinates. The first column of this matrix should be the x-coordinate. This frequently comes from the POS column in VCF data. Because the coordinates for each chromosome typically begins at one, multiple chromosomes become awkward to arrange. (That is, their coordinate systems would need to be altered in some manner.) Every subsequent column can be a different data type. For our data we'll use information from the 'var.info' slot of our example data. We have some values that are rather large so we'll prune these data to only include values below 1,000.

dot1 <- as.matrix( chrom@var.info[,c(2,3,4)] )
dot1 <- dot1[ dot1[,3] < 1000, ]
head(dot1)

We see that the first column is the position (POS) and we have two subsequent columns of numeric data. We can plot the data by selecting the first two columns and using that as the dmat (dot matrix) argument for dr.plot.

dr.plot( dmat = dot1[,1:2], 
         chrom.e = 100001,
         dcol=c(rgb(34,139,34, maxColorValue = 255))
         )

We can plot two levels of data by using all three columns in our call.

dr.plot( dmat = dot1, 
         chrom.e = 100001,
         dcol=c(rgb(34,139,34, maxColorValue = 255), rgb(0,206,209, maxColorValue = 255)) 
         )

Because we've chosen to plot the data from different columns with different colors we need to pass a vector of colors as our argument to dcol. The concatenation function(c()) manages this for us.

Rectangle plots

Barcharts are frequently used to summarize information. In the context of the dr.plot barcharts are created using matrices of coordinates. These matrices consist of coordinates for the xleft, ybottom, xright and ytop of each rectangle. This was modeled after rect(). The win.info' slot of ourchromR` object has information on window coordinates as well as the count of variants per window (variant incidence). We'll use this information for this example.

mat1 <- cbind(chrom@win.info[,'start'],
              0, 
              chrom@win.info[,'end'], 
              chrom@win.info[,'variants']
              )
head(mat1)

We can now use this matrix in our call to dr.plot() to create our graphic.

dr.plot( rlst = mat1, chrom.e = 100001 )

In order to attain flexibility in the amount of data we can use for a rectangle plot a list of matrices can be passed to dr.plot() as the argument to rlst. We'll first need to create a couple more matrices to use this example. The 'win.info' slot of our chromR object contains information on nucleotide content as well as sliding window coordinates. We can use this information for our example.

mat2 <- cbind(chrom@win.info[,'start'],
              0, 
              chrom@win.info[,'end'],
              rowSums(chrom@win.info[,c('A', 'T')])
)

mat3 <- cbind(chrom@win.info[,'start'],
              rowSums(chrom@win.info[,c('A', 'T')]),
              chrom@win.info[,'end'],
              rowSums(chrom@win.info[,c('A', 'C', 'G', 'T')])
)
dr.plot( rlst = list(mat2, mat3), chrom.e = 100001 )

Here we've relied on the default color palette. We could also provide a vector of colors to the parameter rcol.

Dot and rectangle plots

Now that we know how to create dot plots and rectangle plots we can combine this knowledge to create plots with both elements.

dr.plot( dmat = dot1,
         rlst = list(mat2, mat3), 
         chrom.e = 100001,
         dcol=c(rgb(34,139,34, maxColorValue = 255), rgb(0,206,209, maxColorValue = 255)) 
         )

More information about the dr.plot can be found in it's manual page.

?dr.plot

Note that the ellipses are part of the funciton definition. This means that other parameters, including ones I have not considered, can be included. For example, using xlim() will allow you to focus the plot on a particular region.

dr.plot( dmat = dot1,
         rlst = list(mat2, mat3), 
         chrom.e = 100001,
         dcol=c(rgb(34,139,34, maxColorValue = 255), rgb(0,206,209, maxColorValue = 255)),
         xlim = c(3e4, 6e4)
         )

chromo plot

A basic chromo plot can be created with just an object of class chromR. Here we use our example data that includes a reference (sequence data), annotations and variants (from the VCF file).

chromo( chrom, boxp = TRUE )

This provides four tracks: annotations, nucleotides, nucleotide content and variant incidence (or count per window). We can compare this to the chromoqc plot presented above. This can be seen as a foundation upon which to include other forms of data as additional panes. Up to three additional panes can be included by using the arguments drlist1, drlist2 or drlist3. The argument used for drlist is a named list that can include the values title, dmat, rlist, dcol, rcol, rbcol and bwcol.

In order to create our drlist we'll start with a matrix of rectangle coordinates.

rlst1 <- cbind( gff[ seq(1,23, by=2) , 4 ],
                0, 
                gff[ seq(1,23, by=2) , 5 ],
                500
               )
head(rlst1)

Next we'll convert our matrix into a list and include it as part of our drlist.

rlist <- list(rlst1)
myList1 <- list(title = "Track1",
                dmat  = chrom@var.info[,2:4],
                rlst = rlst1,
                dcol = c("#8DD3C711", "#FB807211"),
                rcol=4,
                bwcol=1:2
                )

Now we can use this list as an argument in our call to chromo().

chromo( chrom, boxp = TRUE , chrom.e = chrom@len, drlist1 = myList1  )

Note that the dots are painted onto the graphic after the rectangles. Even though we've used transparency (an alpha chanel) to our colors, they obscure the rectangles. This highlights how easy it is to make busy plots with genomic data. We can create additional panes by adding data to drlist2. Here we'll just reuse the data from drlist1 as an example.

chromo( chrom, boxp = TRUE , chrom.e = chrom@len, drlist1 = myList1, drlist2 = myList1  )

In theory just about any sort of data can be used. For example, we can extract the variant depths (DP) from teh VCF data and plot one sample's depths along the chromosome.

dp <- extract.gt(chrom, element = 'DP', as.numeric = TRUE)
myList1 <- list(title = colnames(dp)[1],
                dmat  = cbind(chrom@var.info[,2], dp[,1]),
                dcol="#0000ff44"
                )
chromo( chrom, boxp = TRUE , chrom.e = chrom@len, drlist1 = myList1)

We can include multiple samples in a plot by adding multiple columns to our dot matrix.

myList1 <- list(title = paste(colnames(dp)[1:3], collapse=","),
                dmat  = cbind(chrom@var.info[,2], dp[,1:3]),
                dcol=c("#8DD3C744", "#FB807244", "#BEBADA44")
                )
chromo( chrom, boxp = TRUE , chrom.e = chrom@len, drlist1 = myList1)

Or we can plot them as separate panes.

myList1 <- list(title = paste(colnames(dp)[1], collapse=","),
                dmat  = cbind(chrom@var.info[,2], dp[,1]),
                dcol=c("#8DD3C744")
                )
myList2 <- list(title = paste(colnames(dp)[2], collapse=","),
                dmat  = cbind(chrom@var.info[,2], dp[,2]),
                dcol=c("#FB807211"),
                bwcol=c("#FB8072")
                )
myList3 <- list(title = paste(colnames(dp)[3], collapse=","),
                dmat  = cbind(chrom@var.info[,2], dp[,3]),
                dcol=c("#BEBADA44")
                )
chromo( chrom, boxp = TRUE , chrom.e = chrom@len, 
        drlist1 = myList1,
        drlist2 = myList2,
        drlist3 = myList3
        )

Management of rectangle and dot plots can involve a lot of manual steps. But this can provide a lot of opportunity for customization. Hopefully this document will inspire you to utilize your own creativity with this funciton.



knausb/vcfR_docs documentation built on May 20, 2019, 12:52 p.m.