compHistDists: Compute distances between pairs of histograms
In MMDiff: Statistical Testing for ChIP-Seq data sets

Description Usage Arguments Value Author(s) References See Also Examples

This function computes for each peak pairwise distances between histograms according to the specified method, currently Maximum Mean Discrepancy (MMD), Generalized Minimum Distance (GMD) and simple Pearson correlation (Pearson) are implemented.

compHistDists(DBA, method = 'MMD', CompIDs=NULL, Usefiltered = TRUE,
                           PeakIDs = NULL, NormMethod = 'DESeq',
                           overWrite = FALSE, HistField = 'PeakRawHists',
                           run.parallel = TRUE, verbose = 2,
                           save.file = TRUE, out.dir='.',sigma=NULL)

`DBA`	DBA object, after running getPeakProfiles. Specifically, it uses the element MD, which contains a list of histogram matrices. (see the getPeakProfiles documentation for more information about this data type.)
`method`	specify what method should be used to determine distances between histograms, could be 'MMD' [1], 'GMD' [2] or simple 'Pearson' correlation
`CompIDs`	2 x nComps matrix, specifying sample ids of pairwise comparisons
`Usefiltered`	If TRUE, only peaks that have passed the filter to detect Outliers are considered. findOutlier() must be run first, otherwise all peaks are used
`PeakIDs`	Specify a subset of peaks for which distances should be completed
`NormMethod`	specify which normalization method should be used, currently only the 'DESeq' method [3] is implemented. Note, that unless NormMethod=NULL, getNormFactors has to be called first.
`overWrite`	if TRUE, overwrites earlier computed distances.
`HistField`	name of element in MD that is used to determine distances. This element should again be a list of nPeaks peaks, each containing a matrix of histograms (nSamples x nbins). It can be generated by running getPeakProfiles. Note, nbins may vary between peaks, if they have different length.
`run.parallel`	distribute over available CPUs
`verbose`	for debugging, set to 3 for some extra output
`save.file`	if TRUE, DBA objects are saved
`out.dir`	directory for saving output files
`sigma`	parameter controlling the Kernel size

DBA object, with additional list element DISTS added to MD. DISTS again contains a list element named according to method applied (e.g. MMD). This elemnt is a matrix (nPeaks x nComps) containing all pairwise distances.

Gabriele schweikert

[1] Gretton A. et al )(2006). A kernel methods for the two-sample-problem. In NIPS, pages 513–520, MIT Press

[2] Zhao et al (2012). GMD: Measuring the distance between histograms with applications on high-throughput sequencing reads, Bioinformatics, 28 (8): 1164-1165.

[3] Anders S. and Huber W. (2010). Differential expression analysis for sequence count data Genome Biology, 11 (10): R106

getPeakProfiles, findOutliers, getNormFactors, detPeakPvals, plotHistDists, plotPeak

## Not run: 
# load DBA objects with peak profiles 
data(Cfp1Profiles)

# get normalization factors
Cfp1Norm <- getNormFactors(Cfp1Profiles)

# get all pairwise distances for the samples WT, Null and Resc i.e. WT
# vs Null, WT vs Resc and WT vs Resc: Recommended is the method 'MMD'
# [1], however, this may take a little while. Here, we compute the GMD
# distance instead [2].

Cfp1Dists <- compHistDists(Cfp1Norm, method = 'GMD', 
           NormMethod = 'DESeq') 




# You can also specify, which pairwise distances you are interessted in,
#  e.g.:

CompIDs <- cbind(c("WT.AB2", "Null.AB2"),
c("WT.AB2", "Resc.AB2"),
c("Null.AB2", "Resc.AB2"))

Cfp1Dists2 <- compHistDists(Cfp1Norm, method='GMD', CompIDs=CompIDs,
            NormMethod='DESeq')




# To view pairwise distances you can use the function plotHistDists. For
# example, treating WT and Resc as control replicates and Null as a
# treatment group, you can contrast the 'within-group' distances with 
# 'between-group' distances:

group1 <- c("WT.AB2","Resc.AB2")
group2 <- c("Null.AB2") #
plotHistDists(Cfp1Dists, group1=group1, group2=group2, method='GMD')

#see detPeakPvals to determine which peaks are significantly different
#between the two groups.

## End(Not run)