knitr::opts_chunk$set(echo = TRUE, eval = TRUE, message = FALSE, warning = FALSE) library(tidyverse) library(ChIPUtils) theme_set(theme_bw())
ChIPUtils is a package for very general ChIP-seq analysis. The basic idea behind this package is to gather several of the common procedures we use when starting to analize a new ChIP dataset. To load the package is necessary to use:
library(ChIPUtils)
For the vignette we use the following samples which are based on ENCODE datasets:
bamfiles = list.files(system.file("extdata",package = "ChIPUtils"), full.names = TRUE,recursive = TRUE, pattern = "bam") bamfiles = bamfiles %>% {.[-grep("bai",.)]} bamfiles %>% basename()
To use the different methods that the package contains, it is neccesary to create a ChIPdata object, for which is necessary to have a file in bam format with aligned reads and to indicate if the reads are single or paired ended:
example = ChIPdata(bamfiles[1],isPE = FALSE) example %>% slotNames()
and the number of aligned reads in the experiment can be retrieved by:
nreads(example)
Using ChIPUtils we can calculate the PBC and SCC, which are the two most commonly used QC metrics for ChIP-seq:
PBC(example)
scc = SCC(example,shift = seq(1,300,by = 10),verbose = FALSE) fl = scc[which.max(scc$corr),]$shift rl = reads(example)[1] %>% width() scc %>% ggplot(aes(shift,corr))+geom_line(colour = "red")+ geom_vline(xintercept = fl,linetype = 2)
Using the SCC curve above, we can calculate the Normalized Strand Cross-Correlation by:
NSC(scc)
and the Relative Strand Cross-Correlation by:
RSC(scc,rl)
Quick note: To calculate the SCC method is parallel it is only necessary to specify the processing cores to use something like the following chunk:
options(mc.cores = 2)
If this option
is not defined, the SCC method will be run in only one core by default.
ChIPUtils contains a collection of genome sizes, which can be seen by using:
possibleChrSizes()
For example after picking the r "hg19"
genome, we can partition it into fixed length bins by using:
data("chromSizes",package = "ChIPUtils") createBins(1e4,chrom = chromSizes[["hg19"]])
Furthermore, we can use the same command to count the number of extended reads that overlap each bin, by adding a ChIPdata object and specifying the unobserved fragment length r fl
:
## we are using only the first three chromosomes bins = createBins(1e3,chipdata = example,chrom = chromSizes[["hg19"]][1:3],fragLen = fl) bins
tibble(tagCounts = bins$tagCounts) %>% filter(tagCounts > 0) %>% ggplot(aes(tagCounts))+geom_histogram(fill = "white",colour = "black")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.