Description Usage Arguments Details Value Author(s) References Examples
Create a matrix of ChIP-seq count data from sorted bam files
using a non-overlapping genomic partition. Used within the main peak calling
function, BQ
.
1 2 | count.table(dir, ChIP.files, control.files, bin.size = NULL,
frag.length = NULL, minimum.count = 20)
|
dir |
Directory where the sorted bam files (and their corresponding bam indices) are saved. |
ChIP.files |
File names (with file extensions) of the ChIP sample files in sorted bam format. |
control.files |
File names (with file extensions) of the input/control sample files in sorted bam format. |
bin.size |
Window size, constant across
all samples, used to generate a non-overlapping partition for counts. If
|
frag.length |
Average length of the ChIP fragments in each sample
provided. Reads are extended to this length from their 3' ends. If
|
minimum.count |
The count threshold used for filtering out windows with sparse counts. Any genomic window with counts less than this value across all samples will be removed. |
This function creates a count table of ChIP sequencing data (supplied as sorted bam files) using a non-overlapping partition across the genome.
The fragment length (if not provided) is estimated using the cross-correlation method of Ramachandran et al (2013). A fragment length is estimated for each sample, after removing duplicate reads, by taking the average over all chromosomes in the sample. Estimation is performed at 5 bp resolution and restricted to a minimum fragment length of 50 bp and maximum of 600 bp.
The bin size (if not provided) is selected using a procedure by Shimazaki and Shinomoto (2007) based on minimizing the mean-integrated squared error for a time-dependent Poisson point process. This procedure is applied to each ChIP sample (at 5 bp resolution, restricted to a minimum of 50 bp and maximum of 1000 bp), and the minimum across all ChIP samples is returned as the bin size.
For a given sample and window, the count is determined as the number of fragments overlapping the window.
A list containing:
counts |
Data frame with rows corresponding to genomic windows and columns for the chromosomes, start and end locations, as well as a column for the counts of each sample. |
bin.size |
The bin size used to create the genomic partition. |
fragment.length |
Vector of the fragment lengths used to extend the reads in each sample. |
filter |
Count threshold used to create the counts data frame. Windows with counts summed across all samples that fall below this value were removed. |
Emily Goren (emily.goren@gmail.com).
Shimazaki and Shinomoto (2007) "A method for selecting the bin size of a time histogram" Neural computation, 19(6), 1503-27.
Ramachandran, Palidwor, Porter, and Perkins (2013) "MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data" Bioinformatics 29(4), 444-50.
1 2 3 4 5 6 7 8 9 10 | ## Not run:
fpath <- paste0(system.file(package = 'BinQuasi'), '/extdata/')
d <- count.table(dir = fpath,
ChIP.files = c('C1.bam', 'C2.bam'),
control.files = c('I1.bam', 'I2.bam'),
bin.size = 60, frag.length = c(101, 300, 150, 10),
minimum.count = 20)
head(d$counts)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.