calculateIDR: Calculate irreproducibly discovery rate

Description Usage Arguments Details Value Examples

View source: R/idr.R

Description

Assess the consistency between ChIP-Seq replicates according to Qunhua Li et al.

Usage

1
2
3
4
5
calculateIDR(chipSamples, inputSamples, org, assembly, version,
  readLength = NULL, shiftRange = c(-500, 1500), binSize = 5,
  crossCorrelationPeakShift = NULL, invalidCrossCorrelationPeaks = c(10,
  readLength + 10), halfWidth = NULL, overlapRatio = 0, isBroadPeak = F,
  cluster = NULL)

Arguments

chipSamples

vector of file names of immuno-precipitated samples in .bam format or list of GAlignments objects

inputSamples

vector of file names of control samples in .bam format or list of GAlignments objects

org

organism as described in the BSGenome package: Hsapiens, Mmusculus, ...

assembly

assembly name: UCSC, NCBI, TAIR, ...

version

assembly version: hg19, mm9, ...

readLength

length of reads (helps identify phantom peak), if NULL then it is determined automatically from the first 10000 reads in the BAM file

shiftRange

strand shifts at which cross-correlation is evaluated

binSize

step size for shiftRange

crossCorrelationPeakShift

user-defined cross-correlation peak shift (when given, SPP does not try to detect the peak shift automatically)

invalidCrossCorrelationPeaks

strand shifts to exclude (to avoid phantom peaks)

halfWidth

a numerical value to truncate the peaks to; NULL means use peak width reported by SPP

overlapRatio

a value between 0 and 1. It controls how much overlaps two peaks need to have to be considered as calling the same region. It is the ratio of overlap / short peak of the two. When set to 0, two peaks are deemed as calling the same region, if they overlap by at least 1 bp

isBroadPeak

if broadpeak is used, set to T; if narrowpeak is used, set to F

cluster

a snow cluster: cluster <- snow::makeCluster(4)

Details

Assess the consistency between ChIP-Seq replicates according to Qunhua Li et al.:

Ref: Qunhua Li, James B. Brown, Haiyan Huang, and Peter J. Bickel: Measuring reproducibility of high-throughput experiments. Ann Appl Stat. 2011 October 13

The irreproducible discovery rate (IDR) is a measure for the probability that a peak is called in a ChIP-Seq sample, if it has been called in another sample. Peaks that result from biological activity should be called consistently between replicates. They are assigned a low IDR. In contrast, peaks that are noise are typically not called in all replicates and are assigned a high IDR. The output of this function is meant to be used as input to the function plotIDR, which generates plots for the assessment of the IDR of ChIP-Seq replicates. Refer to Qunhua Li et al (2011) for an explanation on how to interpret the plots.

Value

object that is suitable as input to plotIDR

Examples

1
2
3
4
5
6
7
chipTuples <- calculateIDR(c("IP1.bam", "IP2.bam"), c("input1.bam", "input2.bam"),
                            "Hsapiens", "UCSC", "hg19")
for (chipTuple in chipTuples) {
    pdf(paste(basename(chipTuple$rep1), "_VS_", basename(chipTuple$rep2), ".pdf", sep=""), paper="a4r", width=11, height=8.5)
    plotIDR(chipTuple)
    dev.off()
}

imbforge/encodeChIPqc documentation built on May 18, 2019, 4:45 a.m.