waveseq: Run WaveSeq Analysis Pipeline
In WaveSeqR: A Wavelet-based Method for ChIP-Seq peak calling

Description Usage Arguments Details Value Author(s) See Also Examples

This is the main function of the WaveSeq algorithm modules implemented in R. There are six processing steps:
(1) Pre-process the input BED file.
(2) Perform Monte Carlo sampling to estimate wavelet coefficient thresholds using the continuous wavelet transform (CWT).
(3) Apply thresholds to CWT of input data to obtain putative peaks.
(4) Concatenate peaks within a user-specified distance (optional)
(5) If control data is specified, compare reads within peaks to control with an exact binomial test.
(5) If control data is absent, use randomized algorithm to estimate background distribution and assign p-values to peaks.
(6) Correct p-values for multiple-testing.

waveseq(chip, control=NA, outdir, exptname, preprocess = TRUE, redundancy = TRUE, thresdist = TRUE, 
	peak.calling = TRUE, mother = "morlet", winsize = 200, fragsize = 150, gap = 0, minreads = 8,
	samplesize = 1e6, adj.method = "fdr", p.thres = 0.2, maxscale = 12, N = 5000, minsig = 2, 
	minsigscale = 3, maxsigscale = -1, p.min = 0.001, p.max = 0.3, binom.sided = "two.sided")

`chip`	Input file in BED format. If `preprocess = FALSE`, `chip` is assumed to be in bedGraph format (required).
`control`	Padded bedGraph file for the control data being analyzed (default = NA).
`outdir`	Output directory (required).
`exptname`	Unique string to identify name of experiment (required).
`preprocess`	Logical value to indicate whether input file should be preprocessed or not. If missing or `FALSE`, input file is assumed to be in bedGraph format (default=TRUE).
`redundancy`	Logical value to indicate whether redundant reads should be removed before calculating summary counts (default=TRUE).
`thresdist`	Logical value to indicate whether threshold estimation step is to be performed or not. If missing or `FALSE`, thresholds will be assumed to exist in `thresdist` folder of `outdir` (default=TRUE).
`peak.calling`	Logical value to indicate whether peak-calling step is to be performed or not (default=TRUE).
`mother`	Wavelet mother function used for the CWT. Tested choices are "morlet","haar","gaussian1", "gaussian2" (default="morlet").
`winsize`	Window size for padded graph files (default=200).
`fragsize`	Library size used for sequencing (default=150).
`gap`	Maximum number of non-significant windows separating peaks that are to be concatenated (default = 0).
`minreads`	Minimum reads within a peak (default = 8). This filtering is necessary for a more accurate estimation of FDR.
`samplesize`	Number of peaks to sample for building ecdf with the randomized algorithm (default = 1e6).
`adj.method`	Method for multiple testing adjustment. Can use any methods accepted by `p.adjust` (default = "fdr").
`p.thres`	Threshold p-value for calling significant windows (default=0.2)
`maxscale`	Maximum scale of wavelet decomposition. Also defines size of sample (2^maxscale) (default=12).
`N`	Number of samples (default=5000)
`minsig`	Minimum number of significant scales for a window to be considered significant (default=2).
`minsigscale`	Minimum significant scale considered for peak detection (default=3).
`maxsigscale`	Minimum significant scale considered for peak detection (default=-1).
`p.min`	Minimum quantile (1-p.min) to be output for wavelet coefficient distribution (default=0.3).
`p.max`	Maximum quantile (1-p.max) as above (default=0.001).
`binom.sided`	One-sided or two-sided binomial test, if control is not NA (default = "two.sided")

This is the main function to apply the WaveSeq algorithm to Next-generation sequencing data e.g. ChIP-Seq.

This function concatenates peaks within a specified distance, estimates FDR in the presence or absence of control data and writes peaks to file. This file has the following fields:

`chromosome`	Chromosome name
`start`	Start position of peak (1-based indexing)
`end`	End position of peak
`ChIP.reads`	Number of ChIP reads in the peak (normalized to per million mapped reads if `control` != NA)
`Control.reads`	Number of control reads in the peak (normalized to per million mapped reads); Output only if `control` != NA
`Fold.Change`	Fold-change calculated as `ChIP.reads`/`Control.reads`; Output only if `control` != NA
`p.value`	P-value for the peak
`adj.p.value`	P-values adjusted for multiple comparisons using `adj.method`

Apratim Mitra

sample, ecdf, p.adjust, binom.test

## Not run: 
# load package
library(WaveSeqR)

# get paths to data files
WS.path <- system.file(package="WaveSeqR")
chip.file <- file.path(WS.path,"inst","extdata","gabp_valouev2008_chr22-pad.graph")
control.file <- file.path(WS.path,"inst","extdata","rxnoip_valouev2008_chr22-pad.graph")
test.dir <- file.path(WS.path,"extdata")

# ChIP-Seq analysis with control
waveseq(chip=chip.file,
	outdir=test.dir,
	control=control.file,
	exptname="GABP_valouev_chr22_control",
	preprocess=FALSE,
	thresdist=FALSE,
	gap=0,
	winsize=200,
	minreads=8,			
	adj.method="fdr")
	
# ChIP-Seq analysis without control
waveseq(chip=chip.file,
	outdir=test.dir,
	exptname="GABP_valouev_chr22_nocontrol",
	preprocess=FALSE,
	thresdist=FALSE,
	gap=0,
	winsize=200,
	minreads=8,
	samplesize=1e6,
	adj.method="fdr")

## End(Not run)