library(Segvis) # devtools::load_all() library(BiocStyle) library(ggplot2) library(Rsamtools) library(data.table) library(RColorBrewer) library(BiocParallel) Segvis = "Segvis" theme_set(theme_minimal()) knitr::opts_chunk$set(comment=NA, fig.width=6, fig.height=3, include = TRUE,echo = FALSE,eval = TRUE)
This vignette provides an introduction to the visualization of sequencing data by using the r Rpackage(Segvis)
package. The minimum input to the package includes:
r Rpackage(Segvis)
provides different tools to summarize and visualize
these data, including but not limited to the following tasks:
r Rpackage(Segvis)
:The package can be loaded with the command:
library(Segvis)
Different visualization of the data is done by the use of the SegvisData class. This class extends the r Biocpkg("GenomicRanges")
class to additionally contain RleLists with the experiment's coverage.
The minimum input for the package includes:
The coordinates may be obtained by several means: Visual exploration in the genome browser, calling peaks from a ChIP-seq experiment, etc. To use r Rpackage(Segvis)
it is necessary to load the regions of interest and the aligned reads (of one or several experiments) into a r Rpackage("SegvisData")
object.
For example, if the peaks are saved in a narrowPeak file format (A description of several common file formats is given in https://genome.ucsc.edu/FAQ/FAQformat.html, then we can create the r Rpackage("SegvisData")
object as follows:
dr = system.file("extdata","example",package = "Segvis") files = list.files(dr,pattern = "bam",full.names =TRUE) files = files[grep("bai",files,invert = TRUE)] ## ignore bam-index files peaks_file = file.path(dr,"encode_K562_Ctcf_peaks_first3chr.narrowPeak") ctcf = readBedFile(peaks_file) ctcf = reduce(ctcf) ## some peaks are listed more than one time frag_len = 200 segvis = SegvisData(regions = ctcf,files,frag_len = frag_len) segvis
A complete description of the meaning of each column in the peaks file is given in https://genome.ucsc.edu/FAQ/FAQformat.html#format12.
By using the r Rpackage("SegvisData")
method, coverages are generated for each file of aligned reads, these profiles are accesible by using the r Rpackage("covers")
method:
length(covers(segvis)) covers(segvis)[[1]]
Additionally for the case of DNA, coverage are generated by using only the reads that were aligned to certain strand. We are using fwd_covers
and bwd_covers
for the forward and reverse strand profiles, respectively.
fwd_covers(segvis)[[1]]
Finally the number of aligned reads in each bam file is pre-processed and contained in the r Rpackage("SegvisData")
object, it is accesible by:
nreads(segvis)
This method shows that as expected the number of reads differ per experiment.
r Rpackage(Segvis)
:After creating a SegvisData object, one of the most simple capabilities to explore the data is to plot specific regions. For example, to plot a specific peak in the object is done by accesing its position in the GRanges object or just using a whole GRanges object:
plot_region(segvis,1,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac", "H3k4me1","H3k4me3")) plot_region(segvis,resize(segvis[1],width = 2e3,fix = "center"), nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac","H3k4me1","H3k4me3"))
Similarly to give more control to the user when plotting regions, we included the DT_region method which returns a r "data.table"
with the information used in the plot:
DT_region(segvis,1,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac", "H3k4me1","H3k4me3"))
To plot the average (o any other summary statistics) across a set of regions, the region need to have the same width. Therefore, a common method to normalize the width of the regions is to center them across the peaks summit:
segvis$summit = find_summits(segvis,1) start(segvis) = segvis$summit - 200 end(segvis) = segvis$summit + 200 segvis
By construction the width is the same for all the intervals. Therefore to plot the average (median) profile is only necessary to do as follows:
plot_profile(segvis,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac", "H3k4me1","H3k4me3")) plot_profile(segvis,FUN = median, nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac", "H3k4me1","H3k4me3"))
Similarly, as with the r Rpackage("plot_region")
and r Rpackage("DT_region")
functions. It is also possible to obtain only the r Rpackage("data.table")
without making the whole plot:
DT_profile(segvis,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac", "H3k4me1","H3k4me3"))
r Rpackage(Segvis)
can plot heatmaps with the normalized signal profiles by using the plot_heatmap method:
plot_heatmap(segvis,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac", "H3k4me1","H3k4me3"))
By default, the r Rpackage("plot_heatmap")
method clusters the normalized signal across the windows by using the profiles from the first files added to the SegvisData, an eucliean distance and the complete agglomeration technique. However, by the use of the r "which.cluster"
, r "dist_method"
and r "clust_method"
parameters is possible to explore different clustering strategies when analizing the data.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.