waveseq: Run WaveSeq Analysis Pipeline

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/waveseq.R

Description

This is the main function of the WaveSeq algorithm modules implemented in R. There are six processing steps:
(1) Pre-process the input BED file.
(2) Perform Monte Carlo sampling to estimate wavelet coefficient thresholds using the continuous wavelet transform (CWT).
(3) Apply thresholds to CWT of input data to obtain putative peaks.
(4) Concatenate peaks within a user-specified distance (optional)
(5) If control data is specified, compare reads within peaks to control with an exact binomial test.
(5) If control data is absent, use randomized algorithm to estimate background distribution and assign p-values to peaks.
(6) Correct p-values for multiple-testing.

Usage

1
2
3
4
waveseq(chip, control=NA, outdir, exptname, preprocess = TRUE, redundancy = TRUE, thresdist = TRUE, 
	peak.calling = TRUE, mother = "morlet", winsize = 200, fragsize = 150, gap = 0, minreads = 8,
	samplesize = 1e6, adj.method = "fdr", p.thres = 0.2, maxscale = 12, N = 5000, minsig = 2, 
	minsigscale = 3, maxsigscale = -1, p.min = 0.001, p.max = 0.3, binom.sided = "two.sided")

Arguments

chip

Input file in BED format. If preprocess = FALSE, chip is assumed to be in bedGraph format (required).

control

Padded bedGraph file for the control data being analyzed (default = NA).

outdir

Output directory (required).

exptname

Unique string to identify name of experiment (required).

preprocess

Logical value to indicate whether input file should be preprocessed or not. If missing or FALSE, input file is assumed to be in bedGraph format (default=TRUE).

redundancy

Logical value to indicate whether redundant reads should be removed before calculating summary counts (default=TRUE).

thresdist

Logical value to indicate whether threshold estimation step is to be performed or not. If missing or FALSE, thresholds will be assumed to exist in thresdist folder of outdir (default=TRUE).

peak.calling

Logical value to indicate whether peak-calling step is to be performed or not (default=TRUE).

mother

Wavelet mother function used for the CWT. Tested choices are "morlet","haar","gaussian1", "gaussian2" (default="morlet").

winsize

Window size for padded graph files (default=200).

fragsize

Library size used for sequencing (default=150).

gap

Maximum number of non-significant windows separating peaks that are to be concatenated (default = 0).

minreads

Minimum reads within a peak (default = 8). This filtering is necessary for a more accurate estimation of FDR.

samplesize

Number of peaks to sample for building ecdf with the randomized algorithm (default = 1e6).

adj.method

Method for multiple testing adjustment. Can use any methods accepted by p.adjust (default = "fdr").

p.thres

Threshold p-value for calling significant windows (default=0.2)

maxscale

Maximum scale of wavelet decomposition. Also defines size of sample (2^maxscale) (default=12).

N

Number of samples (default=5000)

minsig

Minimum number of significant scales for a window to be considered significant (default=2).

minsigscale

Minimum significant scale considered for peak detection (default=3).

maxsigscale

Minimum significant scale considered for peak detection (default=-1).

p.min

Minimum quantile (1-p.min) to be output for wavelet coefficient distribution (default=0.3).

p.max

Maximum quantile (1-p.max) as above (default=0.001).

binom.sided

One-sided or two-sided binomial test, if control is not NA (default = "two.sided")

Details

This is the main function to apply the WaveSeq algorithm to Next-generation sequencing data e.g. ChIP-Seq.

Value

This function concatenates peaks within a specified distance, estimates FDR in the presence or absence of control data and writes peaks to file. This file has the following fields:

chromosome

Chromosome name

start

Start position of peak (1-based indexing)

end

End position of peak

ChIP.reads

Number of ChIP reads in the peak (normalized to per million mapped reads if control != NA)

Control.reads

Number of control reads in the peak (normalized to per million mapped reads); Output only if control != NA

Fold.Change

Fold-change calculated as ChIP.reads/Control.reads; Output only if control != NA

p.value

P-value for the peak

adj.p.value

P-values adjusted for multiple comparisons using adj.method

Author(s)

Apratim Mitra

See Also

sample, ecdf, p.adjust, binom.test

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Not run: 
# load package
library(WaveSeqR)

# get paths to data files
WS.path <- system.file(package="WaveSeqR")
chip.file <- file.path(WS.path,"inst","extdata","gabp_valouev2008_chr22-pad.graph")
control.file <- file.path(WS.path,"inst","extdata","rxnoip_valouev2008_chr22-pad.graph")
test.dir <- file.path(WS.path,"extdata")

# ChIP-Seq analysis with control
waveseq(chip=chip.file,
	outdir=test.dir,
	control=control.file,
	exptname="GABP_valouev_chr22_control",
	preprocess=FALSE,
	thresdist=FALSE,
	gap=0,
	winsize=200,
	minreads=8,			
	adj.method="fdr")
	
# ChIP-Seq analysis without control
waveseq(chip=chip.file,
	outdir=test.dir,
	exptname="GABP_valouev_chr22_nocontrol",
	preprocess=FALSE,
	thresdist=FALSE,
	gap=0,
	winsize=200,
	minreads=8,
	samplesize=1e6,
	adj.method="fdr")

## End(Not run)

WaveSeqR documentation built on May 2, 2019, 5:19 p.m.