knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The bacsnp package is a simple tool for the analysis of single nucleotide positions in viruses from the family Baculoviridae (https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/baculoviridae). Baculoviruses have a circular-closed and double-stranded DNA genome, which are lacking the presence of introns and exons. Baculovirus species and isolates are often sequenced by whole genome sequencing techniques, e.g. Illumina, and assembled subsequently to consensus sequences, which itself is then used for the detection of single nucleotide polymorphisms (SNP). The package bacsnp is intended to help analyzing the detected varialbe positions by loading them into R and processing them for later downstream analysis and visualisation.
The idea behind bacsnp is the handling of multiple Illumina (short read) datasets that come from different sequenced baculovirus isolates or samples that have been mapped against a common reference sequence. By using a common reference, all SNPs found are directly related to each other and specificities can also be assigned to the individual positions. Detected SNP positions can be either isolate specific or specific for a group of isolates. Determine the specificity of SNP position is a feature of bacsnp.
The bacsnp package starts after calling variable positions using MPileup and sequencing data should be processed under the following requirements:
For exact worklow description and full list of parameters see Wennmann et al. (submitted).
For running the bacsnp code some R packages are recommended and loaded.
library(VariantAnnotation) library(IRanges) library(bacsnp)
The bacsnp package requires to import the VCF file (v4.2) by the VariantAnnotation packge using readVcf function.
bacdata <- readVcf("bac.vcf")
For a easier presentation and analysis of the data, the VCF file is transformed to a data frame format.
bac.df <- bacsnp.transformation(bacdata) head(bac.df)
The following information is stored in the individual columns:
After the transformation of the VCF data into a data frame, information about the experimental setup can be assessed by standard R commands.
unique(bac.df$ISO)
The names of the sequences isolates (Iso1 and Iso2) are preceded by the letter "i". This is important later so that the names of the isolatese are not confused with the specificities of the individual positions.
The number of unique SNP positions found ban also be determined.
length(unique(bac.df$POS))
The individual SNP positions are listed below.
unique(bac.df$POS)
The data can be filtered by three different criteria.
In this example the a minimal absolute read depth of 100 is required in order to consider a detected SNP position. In addition the minimal read depth of the alternative is required to be above 10 and the relative frequency of an alternative nucleotdie must be higher than 0.05 (5 percent.
bac.f <- bacsnp.filter(bac.df, min.abs.cov = 100, min.abs.alt = 10, min.rel.alt = 0.05)
One of the most improtant stepts is to determine the specificities that were already mentioned above. To do this, the function bacsnp.specificity is called. It requires the data set as input and a vector that contains all isolates that are to be used for the determination of the specificities. Not all isolates need to be specified here. A smaller number is also sufficient for which the SNP specificities are determined. The specificities are then transferred to the isolates that are not taken into account.
bac.spec <- bacsnp.specificity(bac.f, c("iIso1", "iIso2"), which.rel = "REL.ALT1")
The function snp_specificity provides an element of type list with two major information. First, all combinations with SNP specificity were identified.
bac.spec$combinations
In this example 251 positions were found to be specific for the second isolate. 44 positions were variable in both isolates and only one position is determined to be specific for isolat 1 only.
The specificities are added to the data frame (GROUP.ID, NT.SPEC and SPEC).
head(bac.spec$data[, c("POS", "REF", "ALT","REL.ALT1", "SPEC", "GROUP.ID", "NT.SPEC")])
The function bacsnp.pot is based on ggplot2 can be used for visualtization of SNP positions and their frequencies.
i1 <- bac.spec$data[bac.spec$data$ISO == "iIso1",] i2 <- bac.spec$data[bac.spec$data$ISO == "iIso2",] bacsnp.plot(i1, col = "SPEC", genome.length = 123193, mark.repeats = FALSE)#, mark.lowGQ = 120, mark.lowQUAL = 400) bacsnp.plot(i2, col = "SPEC", genome.length = 123193, mark.repeats = FALSE)#, mark.lowGQ = 120, mark.lowQUAL = 400)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.