Add histograms (binned read enrichment profiles) to an existing DBA object

Description

This function is a wrapper that collects all 5' starting positions of mapped short reads in the bam.files that match to one of nPeaks regions defined by Peaks. It corrects for strand shifts and builds for each Peak a (nSamples x nBins) matrix containing histograms for each sample. Additionally computes total counts per peak and sample.

Usage

1
2
 getPeakProfiles(DBA, Peaks, bin.length = 20, keep.extra = FALSE,  
 draw.on = TRUE, save.files = FALSE, use.old = TRUE, out.dir = ".",run.parallel=TRUE)  

Arguments

DBA

DBA object, which can be generated with the dba function of the DiffBind package. A standard DBA object is an S3 object (class "DBA"), which is introduced with the DiffBind package and more information on this class can be found in the Diffbind vignette. This function uses the following elements of the DBA object: chrmap, samples$bamReads, samples$bamControl

Peaks

GRanges Object containing genomic corrdinates of the regions of interessts.

bin.length

number of base pairs that are summarised per bin.

keep.extra

if TRUE extra information is kept, e.g. RawHists, which Counts.p and Counts.n corresponding to histograms on forward and reverse strand before strand shift is applied.

draw.on

if TRUE, strand shift plots are created for each data sample. This can be used for quality control (see details below)

save.files

if TRUE, DBA objects are saved for each data sample

use.old

if TRUE, available files in out.dir are loaded

out.dir

directory for saving output files

run.parallel

distribute over available CPUs

Details

This function uses as a starting point a DBA object (see the DiffBind package for more details). The path to the bam files is stored in the samples$bamReads element of the DBA object. In addition, in samples$bamControl, bam files for the control samples should be specified. These files are accessed using Rsamtools to collect all short reads that match to regions defined by Peaks. 5' Positions of reads are returned for reads mapping to positive and negative strand respectively. To correct for the strand shift, only peaks are selected whose total number of reads mapping to each strand is in the 9th decile. If draw.on=TRUE, a plot is generated for each bam file (all bamRead and bamControl files), showing smoothscatter plots of total number of reads mapping to the peaks on forward vs reverse strand. This can be used as a quality control (Points should lie on the diagonal). The peaks used to determine the strand shift are shown in red. For each of the selected peaks the shift between forward and reverse strand is determined using the cross-correlation function ccf. If draw.on=TRUE, histograms are plotted for each bam file, showing the distribution of shifts. The median of shifts is used to correct all reads mapping to any peak in the respective bam file. (Note, the shifts can vary between samples i.e. different bam files.)

Value

DBA object, with additional component MD, which is a list containing

PeakRawHists

list of length nPeaks, containing for each Peak a Matrix of histograms (nSamples x nbins). Note, as Peaks can vary in length, nbins may be different for each Peak

RawTotalCounts

matrix of total counts per peak and sample (nPeaks x nSamples)

RawHists

Only provided if keep.extra=TRUE. Contains a list of length nSamples, containing for each sample: Counts: list of nPeaks histograms, i.e. for each peak contains vector of integers (read counts); Mids: list of length nPeaks, i.e. for each peak contains histogram mid points (chromosome coordinates); Counts.p: As Counts, but with read counts mapping to the forward strand only; Counts.n: As Counts, but with read counts mapping to reverse strand only; Meta: contains preprocessing meta information. (strand shift, bin length)

Author(s)

Gabriele Schweikert

See Also

dba, summary.DBA, GRanges, scanBam, plotPeak, findOutliers, getNormFactors, compHistDists, detPeakPvals, plotHistDists

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
## Not run: 
# To build peak profiles you need to store information on our experiment
# in a sample sheet, for example, "Cfp1.csv". You also need bam files,
# which contain the mapped reads for your experiment. The path to the
# bam files should be specified in the csv sample sheet. Lastly you need
# to specify the regions of interest that you want to examine, i.e. the
# peaks.


# Step 1: For this example we provide bam files, peak files and a sample
# sheet in the data package MMDiffBamSubset. We create a new DBA file
# according to the sample sheet "Cfp1.csv":

library('MMDiffBamSubset')
oldwd <- setwd(system.file("extdata", package="MMDiffBamSubset"))
Cfp1 <- dba(sampleSheet="Cfp1.csv",
           minOverlap = 3)


# Step 2: Specify the regions of interest, i.e. the peaks. For this
# example we have run the peak finder Macs [Zhang et al. Genome Biol
# (2008)] on each of the bam file. The xls files containing a subset of
# the peaks are provided and the path to these files is specified in the
# "Cfp1.csv" sample sheet. By calling the function dba, we have also
# generated a list of consensus peaks, which occur in at least
# minOverlap = 3 samples.  Let's take the first 1000 consensus peaks:

Peaks <- dba.peakset(Cfp1, bRetrieve=TRUE)
Peaks <- Peaks[1:1000]


# Step 3: Now, we create peak profiles by reading in the reads that map to the
# regions specified in Peaks
Cfp1Profiles <- getPeakProfiles(Cfp1, Peaks, save.files=FALSE)


# To view individual peak profiles use the function plotPeak, e.g.:
plotPeak(Cfp1Profiles, Peak.id=20, NormMethod=NULL)


# Alternative to step 2 you could also get read profiles for any other
# set of regions, for example for these 200 consecutive 100bp regions on
# chromosome 1:

 Peaks2 <- GRanges(seqnames = Rle('chr1'),
           ranges = IRanges(start=seq(3200000, 3219900, 100), width=100))

 Cfp1Profiles2 <- getPeakProfiles(Cfp1, Peaks2, save.files=FALSE,
                                 draw.on=FALSE)

 setwd(oldwd)
 
## End(Not run)