Home

/

GitHub

/

In wennj/bacsnp: Analyzing single nucleotide polymorphisms from baculovirus sequencing data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The bacsnp package is a simple tool for the analysis of single nucleotide positions in viruses from the family Baculoviridae (https://talk.ictvonline.org/ictv-reports/ictv_online_report/dsdna-viruses/w/baculoviridae). Baculoviruses have a circular-closed and double-stranded DNA genome, which are lacking the presence of introns and exons. Baculovirus species and isolates are often sequenced by whole genome sequencing techniques, e.g. Illumina, and assembled subsequently to consensus sequences, which itself is then used for the detection of single nucleotide polymorphisms (SNP). The package bacsnp is intended to help analyzing the detected varialbe positions by loading them into R and processing them for later downstream analysis and visualisation.

Worklow for SNP calling

The idea behind bacsnp is the handling of multiple Illumina (short read) datasets that come from different sequenced baculovirus isolates or samples that have been mapped against a common reference sequence. By using a common reference, all SNPs found are directly related to each other and specificities can also be assigned to the individual positions. Detected SNP positions can be either isolate specific or specific for a group of isolates. Determine the specificity of SNP position is a feature of bacsnp.

The bacsnp package starts after calling variable positions using MPileup and sequencing data should be processed under the following requirements:

baculovirus samples were Illumina sequenced\
reads were mapped with BWA (including a group identifier for each data set) resulting in SAM outputs\
each sample was mapped against the identical reference\
SAM files were converted to BAM files\
Mpileup was called on all BAM files (exluding indels)\
BCF tool was used to call variant sites only\
BCF output file is a uncompressed VCF file

For exact worklow description and full list of parameters see Wennmann et al. (submitted).

For running the bacsnp code some R packages are recommended and loaded.

library(bacsnp)

Loading VCF file data

The bacsnp package requires importing VCF files using the vcfR package.

bacdata <- vcfR::read.vcfR(system.file("extdata", "bac.vcf", package = "bacsnp"))

Transforming VCF file data

For a easier presentation and analysis of the data, the VCF file is transformed to a data frame format.

bac.df <- bacsnp.transformation(bacdata)
head(bac.df)

The following information is stored in the individual columns:

POS = Position of the SNP\
ISO = isolates name\
REF = reference nucleotide\
ALT = alternative nucleotides found\
COV = absolute read ("coverage") depth in this position\
COV.REF = absolute number of reference nucleotides\
COV.ALT1/2/3 = absolute number of first/second/third nucleotide\
COV.REL1/2/3 = relative frequency of first/second/third nucleotide

Accessing basic SNP information

After the transformation of the VCF data into a data frame, information about the experimental setup can be assessed by standard R commands.

unique(bac.df$ISO)

The names of the sequences isolates (Iso1 and Iso2) are preceded by the letter "i". This is important later so that the names of the isolatese are not confused with the specificities of the individual positions.

The number of unique SNP positions found ban also be determined.

length(unique(bac.df$POS))

The individual SNP positions are listed below.

unique(bac.df$POS)

Filtering SNP data frame

The data can be filtered by three different criteria.

min.abs.cov = minimal absolute read depth
min.abs.alt = minimal absolute read depth of the alternative
min.rel.alt = minimal relative occurracne of alternative nucleotide

In this example the a minimal absolute read depth of 100 is required in order to consider a detected SNP position. In addition the minimal read depth of the alternative is required to be above 10 and the relative frequency of an alternative nucleotdie must be higher than 0.05 (5 percent.

bac.f <- bacsnp.filter(bac.df, min.abs.cov = 100, min.abs.alt = 10, min.rel.alt = 0.05)

Determination of SNP specificity

One of the most improtant stepts is to determine the specificities that were already mentioned above. To do this, the function bacsnp.specificity is called. It requires the data set as input and a vector that contains all isolates that are to be used for the determination of the specificities. Not all isolates need to be specified here. A smaller number is also sufficient for which the SNP specificities are determined. The specificities are then transferred to the isolates that are not taken into account.

bac.spec <- bacsnp.specificity(bac.f, c("iIso1", "iIso2"), which.rel = "REL.ALT1")

The function snp_specificity provides an element of type list with two major information. First, all combinations with SNP specificity were identified.

bac.spec$combinations

In this example 251 positions were found to be specific for the second isolate. 44 positions were variable in both isolates and only one position is determined to be specific for isolat 1 only.

The specificities are added to the data frame (GROUP.ID, NT.SPEC and SPEC).

head(bac.spec$data[, c("POS", "REF", "ALT","REL.ALT1", "SPEC", "GROUP.ID", "NT.SPEC")])

Visualization of SNP frequencies with bacsnp.plot

The function bacsnp.pot is based on ggplot2 can be used for visualtization of SNP positions and their frequencies.

i1 <- bac.spec$data[bac.spec$data$ISO == "iIso1",]
i2 <- bac.spec$data[bac.spec$data$ISO == "iIso2",]

bacsnp.plot(i1, col = "SPEC", genome.length = 123193, mark.repeats = FALSE)#, mark.lowGQ = 120, mark.lowQUAL = 400)
bacsnp.plot(i2, col = "SPEC", genome.length = 123193, mark.repeats = FALSE)#, mark.lowGQ = 120, mark.lowQUAL = 400)

wennj/bacsnp documentation built on March 6, 2025, 2:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wennj/bacsnp
Analyzing single nucleotide polymorphisms from baculovirus sequencing data

In wennj/bacsnp: Analyzing single nucleotide polymorphisms from baculovirus sequencing data

Worklow for SNP calling

Loading VCF file data

Transforming VCF file data

Accessing basic SNP information

Filtering SNP data frame

Determination of SNP specificity

Visualization of SNP frequencies with bacsnp.plot

R Package Documentation

Browse R Packages

We want your feedback!

wennj/bacsnp Analyzing single nucleotide polymorphisms from baculovirus sequencing data

In wennj/bacsnp: Analyzing single nucleotide polymorphisms from baculovirus sequencing data

Worklow for SNP calling

Loading VCF file data

Transforming VCF file data

Accessing basic SNP information

Filtering SNP data frame

Determination of SNP specificity

Visualization of SNP frequencies with bacsnp.plot

R Package Documentation

Browse R Packages

We want your feedback!

wennj/bacsnp
Analyzing single nucleotide polymorphisms from baculovirus sequencing data