count.table: Create a matrix of ChIP-seq count data
In BinQuasi: Analyzing Replicated ChIP Sequencing Data Using Quasi-Likelihood

Description Usage Arguments Details Value Author(s) References Examples

Create a matrix of ChIP-seq count data from sorted bam files using a non-overlapping genomic partition. Used within the main peak calling function, BQ.

1 2	count.table(dir, ChIP.files, control.files, bin.size = NULL, frag.length = NULL, minimum.count = 20)

`dir`	Directory where the sorted bam files (and their corresponding bam indices) are saved.
`ChIP.files`	File names (with file extensions) of the ChIP sample files in sorted bam format.
`control.files`	File names (with file extensions) of the input/control sample files in sorted bam format.
`bin.size`	Window size, constant across all samples, used to generate a non-overlapping partition for counts. If `NULL`, an estimate will be used (see details).
`frag.length`	Average length of the ChIP fragments in each sample provided. Reads are extended to this length from their 3' ends. If `NULL`, cross correlation will be used to estimate the fragment length of each sample (see details).
`minimum.count`	The count threshold used for filtering out windows with sparse counts. Any genomic window with counts less than this value across all samples will be removed.

This function creates a count table of ChIP sequencing data (supplied as sorted bam files) using a non-overlapping partition across the genome.

The fragment length (if not provided) is estimated using the cross-correlation method of Ramachandran et al (2013). A fragment length is estimated for each sample, after removing duplicate reads, by taking the average over all chromosomes in the sample. Estimation is performed at 5 bp resolution and restricted to a minimum fragment length of 50 bp and maximum of 600 bp.

The bin size (if not provided) is selected using a procedure by Shimazaki and Shinomoto (2007) based on minimizing the mean-integrated squared error for a time-dependent Poisson point process. This procedure is applied to each ChIP sample (at 5 bp resolution, restricted to a minimum of 50 bp and maximum of 1000 bp), and the minimum across all ChIP samples is returned as the bin size.

For a given sample and window, the count is determined as the number of fragments overlapping the window.

A list containing:

`counts`	Data frame with rows corresponding to genomic windows and columns for the chromosomes, start and end locations, as well as a column for the counts of each sample.
`bin.size`	The bin size used to create the genomic partition.
`fragment.length`	Vector of the fragment lengths used to extend the reads in each sample.
`filter`	Count threshold used to create the counts data frame. Windows with counts summed across all samples that fall below this value were removed.

Emily Goren (emily.goren@gmail.com).

Shimazaki and Shinomoto (2007) "A method for selecting the bin size of a time histogram" Neural computation, 19(6), 1503-27.

Ramachandran, Palidwor, Porter, and Perkins (2013) "MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data" Bioinformatics 29(4), 444-50.

## Not run: 
fpath <- paste0(system.file(package = 'BinQuasi'), '/extdata/')
d <- count.table(dir = fpath,
                 ChIP.files = c('C1.bam', 'C2.bam'),
                 control.files = c('I1.bam', 'I2.bam'),
                 bin.size = 60, frag.length = c(101, 300, 150, 10),
                 minimum.count = 20)
                 head(d$counts)

## End(Not run)