knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

RecDplot - R package for visualizing recombination in viral sequences.

RecDplot package provides two approaches to explore the recombination in viral sequences - pairwise distance correspondence plots (PDC plots) and pairwise distance deviation matrices (PDD matrices).

The purpose of this vignette is to show the examples of its usage.

First, we’ll attach the packages we use and load the sequence alignment. Here we load the aligned concatenated ORFs of noroviruses.

library(recDplot)
library(colorRamps)
library(gplots)
library(plotly)
library(ape)

alignment = read.dna("norovirus_example.fasta", format="fasta", as.character=TRUE)
alignment[alignment=='-'] <- NA

Pairwise Distance Deviation (PDD) matrices

To assess the degree of correspondence between genetic distances in different genome regions, PDC plots can be further summarized as pairwise distance deviation matrices (PDD matrices). To build such matrix, the alignment of genomes is divided into windows, then for each window pairwise genetic distances are built. For each pair of regions we estimate the root mean square error of linear regression and visualize it as a heat map. The higher RMSE in matrix, the more pairwise genetic distances deviate from regression line, that might be the consequence of recombination.

Let's calculate PDD matrix for our alignment of norovirus sequences.

PDD_matrix = calc_PDDmatrix(dna_object=alignment, step=50, window=600, method="pdist", modification="pairwise") #calculate PDD matrix
dev.off()
heatmap.2(as.matrix(PDD_matrix), Rowv = FALSE, Colv = "Rowv", dendrogram = 'none', col=matlab.like, main="PDD matrix", tracecol=NA) # visualize PDD matrix as a heatmap

Pairwise Distance Corresponce (PDC) plots

To check the correspondence between genetic distances in specific genome regions we can build Pairwise Distance Correspondence plots.

list_PDC <-  plot_PDCP(alignment, 1, 1000, 5000, 6000)
list_PDC[[1]] # ggplot with PDC plot
head(list_PDC[[2]], n=5) # dataframe with distances between sequences in two regions
list_PDC_control <-  plot_PDCP_control(alignment, 1, 1000, 5000, 6000)
list_PDC_control[[1]] # ggplot with PDC plot
head(list_PDC_control[[2]], n=5) # dataframe with distances between sequences in two regions

To explore which sequence pairs distances deviate from regression line, we suggest to use 'plotly' package.

plot_ly(list_PDC[[2]],type="scatter", x = ~value.x,
        y = ~value.y, colors = "Set1", text = ~paste(row, '\n', col))%>%
  layout(xaxis = list(title = '1-1000'),
         yaxis = list(title = '5000-6000'), legend = list(title=list(text='Interactive PDC plot')))
list_PDCP_with_control <-  plot_PDCP_with_control(alignment, 1, 1000, 6000, 7000)
list_PDCP_with_control[[1]] # ggplot with PDC and control in one figure
head(list_PDCP_with_control[[2]])


v-julia/recDplot documentation built on Feb. 20, 2024, 1:59 p.m.