zinba: ZINBA convenience function

Description Usage Arguments See Also

Description

This is the main function in Zinba consisting of three sequential steps: (1) Processing mapped sample reads and calculating the correspoding covariate information, (2) determining regions significantly enriched for reads given a set of covariates, and (3) merging adjacent and overlapping windows and running a peak boundary refinement to get the exact boundaries of peak regions within these merged regions.

The first step, the buildwindow step, is optional as one can elect to start their analysis using existing set of processed data by referencing its corresponding filelist file, defined below. Also, the peak refinement step is also optional, where one can elect just to use the merged significant window coordinates instead of the refined boundaries (especially in the case of broader signal).

If specified, the first step is taking the raw mapped sample reads, raw mapped input reads (if available), alignability directory, and current build of the genome and building the datasets needed to run the analysis to detect enriched regions. The corresponding individual function to this is the buildwindowdata() function. After the data is built, the locations of each built file (one for each chromsome and offset) is placed in a filelist file.

The next step uses the locations in filelist to import this built data into the analysis step to find windows that are likely to be enriched for counts given a set of covariates. The corresponding individual function to this is getsigwindows(). The posterior probabilities for each window are saved, and the locations of these files are placed in the outfile.winlist file, where outfile is the name chosen to denote the output files of this current run. The winlist is then fed to getrefinedpeaks() if peak refinement is desired. The overlapping signficant windows above the peakconfidence threshold are merged and the SBPC (basecount) for these merged windows are imported and exact peak boundaries are determined and outputted to outfile.peaks.

Usage

1
2
3
4
run.zinba(refinepeaks=1,outfile=NULL,twoBit=NULL,numProc=1,seq=NULL,
		input="none",filetype="bowtie",align=NULL,extension=NULL,basecountfile=NULL,
    printFullOut=0,threshold=.05, mode="peaks", interaction=TRUE, broad=F)
	

Arguments

refinepeaks

Whether exact boundaries of peaks within merged significantly enriched regions is requested (default), otherwise set to 0. Helps to isolate punctuate, sharp peaks. If TRUE, then specify parameter basecountfile

outfile

Prefix used to denote the .wins, .winlist and .peaks files that are outputted by zinba

twoBit

Path to build of the genome your reads were mapped to, in .2bit format

numProc

Number of concurrent jobs to run in parallel, default is 1, recommended to use more if more computing cores are available

seq

Path to mapped sample reads if buildwin=1, formatted as either 'bed', 'tagAlign', or 'bowtie'

input

Path to mapped input reads if buildwin=1, formatted as either 'bed', 'tagAlign', or 'bowtie' (same format as seq). If left blank, then defaults to 'none'. Input control is not necessary to run ZINBA

filetype

Format of mapped sample and input reads. 'bed' is files in the standard .bed format, 'tagAlign' signifies those in .taf format, and 'bowtie' signifies mapped reads directly outputted from bowtie. Default is 'bowtie'

align

Path to directory containing alignability files for each chromsome, obtained from alignAdjust (check how these files are generated) or downloadd from our respository. Alignability information is specific to the uniquness threshold one used to initially filter their mapped reads and the length of the sequence tags used

extension

Average length of fragments in fragment library used, typically around 200

basecountfile

Must be specified if refinepeaks is 1. Path to basecount track containing SBPC information for the entire genome, generated by basealigncount

printFullOut

If set to 1, prints out the original dataset along with the posterior probabilities. Otherwise, prints out only window coordinates and significance score (default)

threshold

FDR threshold to use, default is 0.05

mode

Either "peaks" (default) fo peak calling or "CNV" for a quick estimation of amplified regions in "seq". For this option is is suggested to have "seq" to be your input control for best results

interaction

TRUE or FALSE, whether interaction terms between covariates are considered during model selection. FALSE results in greater speedup during model selection

broad

TRUE or FALSE, whether to merge enriched windows within 5kb of each other

FDR

TRUE (default) or FALSE, whether to use FDR or posterior probability for thresholding (more convervative). When FDR is FALSE, then 1-threshold is used for the posterior probability threshold

See Also

save.


sivarajankumar/zinba documentation built on May 29, 2019, 10:11 p.m.