library(Segvis)
#  devtools::load_all()
  library(BiocStyle)
  library(ggplot2)
  library(Rsamtools)
  library(data.table)
  library(RColorBrewer)
  library(BiocParallel)
  Segvis = "Segvis"

  theme_set(theme_minimal())

  knitr::opts_chunk$set(comment=NA, fig.width=6, fig.height=3,
                 include = TRUE,echo = FALSE,eval = TRUE)

Overview

This vignette provides an introduction to the visualization of sequencing data by using the r Rpackage(Segvis) package. The minimum input to the package includes:

  1. Coordinates for regions of interest.
  2. One or more bam files of aligned read data (e.g. from ChIP-seq experiments).

r Rpackage(Segvis) provides different tools to summarize and visualize these data, including but not limited to the following tasks:

How to use r Rpackage(Segvis):

The package can be loaded with the command:

  library(Segvis)

Different visualization of the data is done by the use of the SegvisData class. This class extends the r Biocpkg("GenomicRanges") class to additionally contain RleLists with the experiment's coverage.

Building a SegvisData object

The minimum input for the package includes:

  1. Coordinates for regions of interest.
  2. One or more bam files of aligned read data (e.g. from ChIP-seq experiments).

The coordinates may be obtained by several means: Visual exploration in the genome browser, calling peaks from a ChIP-seq experiment, etc. To use r Rpackage(Segvis) it is necessary to load the regions of interest and the aligned reads (of one or several experiments) into a r Rpackage("SegvisData") object.

For example, if the peaks are saved in a narrowPeak file format (A description of several common file formats is given in https://genome.ucsc.edu/FAQ/FAQformat.html, then we can create the r Rpackage("SegvisData") object as follows:

  dr = system.file("extdata","example",package = "Segvis")
  files = list.files(dr,pattern = "bam",full.names =TRUE)
  files = files[grep("bai",files,invert = TRUE)] ## ignore bam-index files

  peaks_file = file.path(dr,"encode_K562_Ctcf_peaks_first3chr.narrowPeak")
  ctcf = readBedFile(peaks_file)
  ctcf = reduce(ctcf) ## some peaks are listed more than one time
  frag_len = 200

  segvis = SegvisData(regions = ctcf,files,frag_len = frag_len)
  segvis

A complete description of the meaning of each column in the peaks file is given in https://genome.ucsc.edu/FAQ/FAQformat.html#format12.

Elements in a SegvisData object:

By using the r Rpackage("SegvisData") method, coverages are generated for each file of aligned reads, these profiles are accesible by using the r Rpackage("covers") method:

  length(covers(segvis))
  covers(segvis)[[1]]

Additionally for the case of DNA, coverage are generated by using only the reads that were aligned to certain strand. We are using fwd_covers and bwd_covers for the forward and reverse strand profiles, respectively.

  fwd_covers(segvis)[[1]]

Finally the number of aligned reads in each bam file is pre-processed and contained in the r Rpackage("SegvisData") object, it is accesible by:

  nreads(segvis)

This method shows that as expected the number of reads differ per experiment.

Visualization methods with r Rpackage(Segvis):

How to plot one a region from the SegvisData object:

After creating a SegvisData object, one of the most simple capabilities to explore the data is to plot specific regions. For example, to plot a specific peak in the object is done by accesing its position in the GRanges object or just using a whole GRanges object:

  plot_region(segvis,1,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac",
                                     "H3k4me1","H3k4me3"))
  plot_region(segvis,resize(segvis[1],width = 2e3,fix = "center"),
              nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac","H3k4me1","H3k4me3"))

Similarly to give more control to the user when plotting regions, we included the DT_region method which returns a r "data.table" with the information used in the plot:

    DT_region(segvis,1,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac",
                                     "H3k4me1","H3k4me3"))

How to plot the average profiles across all region in the SegvisData object:

To plot the average (o any other summary statistics) across a set of regions, the region need to have the same width. Therefore, a common method to normalize the width of the regions is to center them across the peaks summit:

  segvis$summit = find_summits(segvis,1)
  start(segvis) = segvis$summit - 200
  end(segvis) = segvis$summit + 200
  segvis

By construction the width is the same for all the intervals. Therefore to plot the average (median) profile is only necessary to do as follows:

  plot_profile(segvis,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac",
                                     "H3k4me1","H3k4me3"))
  plot_profile(segvis,FUN = median,
               nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac",
                                     "H3k4me1","H3k4me3"))

Similarly, as with the r Rpackage("plot_region") and r Rpackage("DT_region") functions. It is also possible to obtain only the r Rpackage("data.table") without making the whole plot:

  DT_profile(segvis,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac",
                                     "H3k4me1","H3k4me3"))

How to plot a heatmap with the profiles across all region in the SegvisData object:

r Rpackage(Segvis) can plot heatmaps with the normalized signal profiles by using the plot_heatmap method:

  plot_heatmap(segvis,nameFiles = c("Ctcf_rep1","Ctcf_rep2","H3k27ac",
                                     "H3k4me1","H3k4me3"))

By default, the r Rpackage("plot_heatmap") method clusters the normalized signal across the windows by using the profiles from the first files added to the SegvisData, an eucliean distance and the complete agglomeration technique. However, by the use of the r "which.cluster", r "dist_method" and r "clust_method" parameters is possible to explore different clustering strategies when analizing the data.

SessionInfo

  sessionInfo()


welch16/Segvis documentation built on May 4, 2019, 4:18 a.m.