Statistical Testing for ChIP-Seq data sets

Share:

Description

This package detects statistically significant difference between read enrichment profiles in different samples. To take advantage of shape differences it uses Kernel methods (Maximum Mean Discrepancy, MMD, [1]).

Details

The starting point for this package is a DBA object created with the package DiffBind [2]. Sample specific peak profiles (histograms) can then be generated for a specified set of peaks. Rsamtools are used to load reads from bam files, strand shifts are corrected and histograms are computed for each peak and sample. Differences between samples at each peak are assessed by computing distances between the corresponding histograms in terms of Maximum Mean Discrepancy (MMD) or Generalized Minimum distance (GMD) [3], taking structural information into account [1]. Empirical p-values can be determined for a comparison of two sets of samples (e.g. control samples vs. treatment samples). Examples are provided using partial data from [3].

Package: MMDiff
Type: Package
Version: 0.99.6
Date: 2012-03-22
License: Artistic-2.0

Function list:

getPeakProfiles: Add histograms (binned read enrichment profiles) to an existing DBA object
findOutliers: Find peaks with extreme count values
getNormFactors: Determine normalisation factors between samples
compHistDists: For each peak, compute distances of histograms between pairs of ChIP-Seq data sets (using Maximum Mean Discrepancy)
detPeakPvals: Determine p-values for each peak comparing two groups of data sets
plotHistDists: For each peak plot computed distances as a function of total counts, show peaks which are significantly different between two groups.
plotPeak: Plot read enrichment profiles for a set of samples at a given peak

Author(s)

Gabriele Schweikert

Maintainer: Gabriele Schweikert <G.Schweikert@ed.ac.uk>

References

[1] Gretton A. et al )(2006). A kernel methods for the two-sample-problem. In NIPS, pages 513–520, MIT Press

[2] Stark R and Brown G (2011). DiffBind: differential binding analysis of ChIP-Seq peak data. Bioconductor http://bioconductor.org/packages/release/bioc/html/DiffBind.html

[3] Zhao et al (2012), GMD: Measuring the distance between histograms with applications on high-throughput sequencing reads, Bioinformatics, 28 (8): 1164-1165.

[4] Clouaire T et al (2012). Cfp1 integrates both CpG content and gene activity for accurate H3K4me3 deposition in embryonic stem cells. Genes Dev. August 1, 2012 26: 1714–1728