zinba: ZINBA convenience function
In sivarajankumar/zinba: ZINBA: Zero-Inflation Negative Binomial Algorithm

Description Usage Arguments See Also

This is the main function in Zinba consisting of three sequential steps: (1) Processing mapped sample reads and calculating the correspoding covariate information, (2) determining regions significantly enriched for reads given a set of covariates, and (3) merging adjacent and overlapping windows and running a peak boundary refinement to get the exact boundaries of peak regions within these merged regions.

The first step, the buildwindow step, is optional as one can elect to start their analysis using existing set of processed data by referencing its corresponding filelist file, defined below. Also, the peak refinement step is also optional, where one can elect just to use the merged significant window coordinates instead of the refined boundaries (especially in the case of broader signal).

If specified, the first step is taking the raw mapped sample reads, raw mapped input reads (if available), alignability directory, and current build of the genome and building the datasets needed to run the analysis to detect enriched regions. The corresponding individual function to this is the buildwindowdata() function. After the data is built, the locations of each built file (one for each chromsome and offset) is placed in a filelist file.

The next step uses the locations in filelist to import this built data into the analysis step to find windows that are likely to be enriched for counts given a set of covariates. The corresponding individual function to this is getsigwindows(). The posterior probabilities for each window are saved, and the locations of these files are placed in the outfile.winlist file, where outfile is the name chosen to denote the output files of this current run. The winlist is then fed to getrefinedpeaks() if peak refinement is desired. The overlapping signficant windows above the peakconfidence threshold are merged and the SBPC (basecount) for these merged windows are imported and exact peak boundaries are determined and outputted to outfile.peaks.

run.zinba(refinepeaks=1,outfile=NULL,twoBit=NULL,numProc=1,seq=NULL,
		input="none",filetype="bowtie",align=NULL,extension=NULL,basecountfile=NULL,
    printFullOut=0,threshold=.05, mode="peaks", interaction=TRUE, broad=F)

`refinepeaks`	Whether exact boundaries of peaks within merged significantly enriched regions is requested (default), otherwise set to 0. Helps to isolate punctuate, sharp peaks. If TRUE, then specify parameter basecountfile
`outfile`	Prefix used to denote the .wins, .winlist and .peaks files that are outputted by zinba
`twoBit`	Path to build of the genome your reads were mapped to, in .2bit format
`numProc`	Number of concurrent jobs to run in parallel, default is 1, recommended to use more if more computing cores are available
`seq`	Path to mapped sample reads if buildwin=1, formatted as either 'bed', 'tagAlign', or 'bowtie'
`input`	Path to mapped input reads if buildwin=1, formatted as either 'bed', 'tagAlign', or 'bowtie' (same format as seq). If left blank, then defaults to 'none'. Input control is not necessary to run ZINBA
`filetype`	Format of mapped sample and input reads. 'bed' is files in the standard .bed format, 'tagAlign' signifies those in .taf format, and 'bowtie' signifies mapped reads directly outputted from bowtie. Default is 'bowtie'
`align`	Path to directory containing alignability files for each chromsome, obtained from alignAdjust (check how these files are generated) or downloadd from our respository. Alignability information is specific to the uniquness threshold one used to initially filter their mapped reads and the length of the sequence tags used
`extension`	Average length of fragments in fragment library used, typically around 200
`basecountfile`	Must be specified if refinepeaks is 1. Path to basecount track containing SBPC information for the entire genome, generated by basealigncount
`printFullOut`	If set to 1, prints out the original dataset along with the posterior probabilities. Otherwise, prints out only window coordinates and significance score (default)
`threshold`	FDR threshold to use, default is 0.05
`mode`	Either "peaks" (default) fo peak calling or "CNV" for a quick estimation of amplified regions in "seq". For this option is is suggested to have "seq" to be your input control for best results
`interaction`	TRUE or FALSE, whether interaction terms between covariates are considered during model selection. FALSE results in greater speedup during model selection
`broad`	TRUE or FALSE, whether to merge enriched windows within 5kb of each other
`FDR`	TRUE (default) or FALSE, whether to use FDR or posterior probability for thresholding (more convervative). When FDR is FALSE, then 1-threshold is used for the posterior probability threshold