title: 'Generating fragment density plots in R/Bioconductor with VplotR' tags: - R - genomics - dataviz authors: - name: Jacques Serizay orcid: 0000-0002-4295-0624 affiliation: 1, 2 - name: Julie Ahringer affiliation: 1 affiliations: - name: The Gurdon Institute and Department of Genetics, University of Cambridge, Cambridge UK; index: 1 - name: Corresponding author (email) index: 2 date: 28 January 2021 bibliography: paper.bib
A multitude of proteins bind to DNA at regulatory regions to control the expression of neighbouring genes. Their binding can be revealed by chromatin accessibility assays such as DNase-seq, MNase-seq or ATAC-seq. Plotting the size of the sequencing fragments generated by these assays relative to the center of regulatory regions produce two-dimensional fragment density plots called V-plots. Such plots can reveal nucleosome positioning or transcription factor binding sites at regulatory regions [@Henikoff2011Nov].
Here, we present VplotR, an R package to easily generate V-plots and one-dimensional footprint profiles over single or aggregated genomic loci of interest. The use of VplotR will improve our understanding of molecular organization at regulatory regions.
VplotR is an R package facilitating the generation of V-plots, i.e. two-dimensional paired-end sequencing fragment density plots [@Henikoff2011Nov]. V-plots have been used in the past to elucidate nucleosome positioning and/or transcription factor binding at regulatory elements [@Henikoff2011Nov; @Schep2015Aug] (e.g. \autoref{fig:example}). There are few available tools to easily generate V-plots [@Schep2015Aug; @Beati2020], and they are provided as scripts to be used with the command-line interface. We developed an R/Bioconductor package, VplotR, to enable more researchers to use this method and to provide the ability to easily customize and interact with other datasets.
VplotR provides functions to import paired-end sequencing bam files from any type of DNA accessibility experiments (e.g. ATAC-seq, DNA-seq, MNase-seq) and can produce V-plots and one-dimensional footprint profiles over single or aggregated genomic loci of interest. The R package is well integrated within the Bioconductor environment and easily fits in standard genomic analysis workflows. Integrating V-plots into existing analytical frameworks has already brought new insights in chromatin organization [@Serizay2020Oct].
VplotR is available as an R package and can be readily installed from Bioconductor. The development version can be found on GitHub. Package dependencies and system requirements are documented at https://js2264.github.io/VplotR/. VplotR has been tested using R (version 3.5 and later) on macOS (versions 10.11 and later), Ubuntu 18.04.2 and Windows machines.
To ensure stability, VplotR includes unit tests for most functions and supports continuous integration using the Travis CI platform. Code contributions, bug reports, fixes and feature requests are most welcome by opening issues and pull requests at https://js2264.github.io/VplotR/.
The main user-level functions of VplotR are plotVmat()
, plotProfile()
and plotFootprint()
. plotVmat()
is used to generate V-plots (i.e. paired-end fragment density plots aggregated over a set of loci of interest) while plotProfile()
is used to generate paired-end fragment plots over a single genomic locus. plotFootprint()
can be used to profile the DNA accessibility footprint measured by a genomic assay over a motif of interest. Additional functions such as importPEBamFiles()
and getFragmentsDistribution()
are useful to import and investigate sets of paired-end sequencing fragments. Full examples of how to use the main package functions are described in the package vignette available at https://js2264.github.io/VplotR/articles/VplotR.html.
VplotR and all its dependencies can be installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("VplotR")
library("VplotR")
Importing paired-end sequencing fragments (usually from MNase-seq, DNase-seq or ATAC-seq experiments) can be done with VplotR using the importPEBamFiles()
function. If imported fragments were obtained by ATAC-seq, the argument shift_ATAC_fragments
should be set to TRUE
to shift the ATAC-seq sequencing fragments by +4/-5 bp [@Buenrostro2013Oct; @Adey2010Dec].
### To import paired-end fragments (e.g. ATAC-seq fragments),
### use `importPEBamFiles()`:
fragments <- importPEBamFiles(
'path/to/fragments/ATACseq-fragments.bam',
shift_ATAC_fragments = TRUE
)
### Toy datasets are provided as examples for this study:
data(ABF1_sacCer3)
data(MNase_sacCer3_Henikoff2011)
The distribution of fragment sizes can be obtained with getFragmentsDistribution()
and plotted using ggplot2
(\autoref{fig:fragsize}):
### The one-dimensional distribution of fragment
### sizes is obtained using the
### getFragmentsDistribution() function:
dist <- getFragmentsDistribution(
MNase_sacCer3_Henikoff2011,
ABF1_sacCer3
)
### The resulting data frame can be plotted
### with ggplot2 (see Figure 2)
ggplot(dist, aes(x = x, y = y)) +
geom_line() +
theme_ggplot2() +
labs(
title = 'Distribution of fragment sizes',
x = 'Fragment size', y = '# of fragments'
)
)
plotVmat()
is used to produce a V-plot (\autoref{fig:vplot}):
plotVmat(
MNase_sacCer3_Henikoff2011,
ABF1_sacCer3
)
Several Vplots can also be produced at once using a list of parameters as input (see https://js2264.github.io/VplotR/articles/VplotR.html#multiple-vplots for more information).
Observations from V-plots can be further investigated by plotting DNA accessibility footprints over these loci. The plotFootprint()
function can be leveraged to plot these footprint profiles (\autoref{fig:footprint}).
plotFootprint(
MNase_sacCer3_Henikoff2011,
ABF1_sacCer3
)
Finally, VplotR provides a function to plot the distribution of paired-end fragments over an individual genomic window (\autoref{fig:locus}).
genes_sacCer3 <- GenomicFeatures::genes(
TxDb.Scerevisiae.UCSC.sacCer3.sgdGene::
TxDb.Scerevisiae.UCSC.sacCer3.sgdGene
)
plotProfile(
fragments = MNase_sacCer3_Henikoff2011,
window = "chrXV:186,400-187,400",
loci = ABF1_sacCer3,
annots = genes_sacCer3,
min = 20, max = 200, alpha = 0.1, size = 1.5
)
The paired-end fragments overlapping a locus of interest (e.g., binding sites, provided in the loci
argument) are shown in red while the remaining fragments mapping to the genomic window are displayed in black. Marginal curves are also plotted on the side of the distribution plot; they highlight the smoothed distribution of the position of paired-end fragment midpoints (top) or of the paired-end fragment length (right). Furthermore, genomic features provided in the annots
arguments are displayed as horizontal bars on top of the plots. Here, the distribution of yeast MNase paired-end fragments over the bi-directional promoter regulating two divergent genes clearly reveals the ABF1 binding site (in red) located in the nucleosome-depleted region (NDR).
VplotR was recently leveraged to provide accurate insights into differential organization of nucleosomes flanking ubiquitous or tissue-specific promoters in C. elegans [@Serizay2020Oct].
Yeast MNase-seq data has been obtained from @Henikoff2011Nov (SRR3193263). ABF1 and REB1 binding motifs in yeast have been annotated in sacCer3 genome using TFBStools
[@Tan2016May] and JASPAR2018
[@Khan2018Jan]. The motif occurrences with a relScore >= 0.90 were considered to be real binding sites.
J.S.: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization. J.A.: Conceptualization, Supervision, Funding acquisition, Project administration.
We would like to thank editors from the Journal of Open Source Software, and notably C. Soneson, for their efforts to ensure a helpful and transparent reviewing procedure. We would also like to thank F. Geier and H. Kironde for their contribution as reviewers. This work was supported by a Wellcome Trust Senior Research Fellowship to J.A. (101863) and a Medical Research Council DTP studentship to J.S..
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.