Description Usage Arguments Value Defining the sliding windows Read extraction and counting Elaborating on directional extension Comments on ext for paired-end data Author(s) See Also Examples
Count the number of extended reads overlapping a sliding window at spaced positions across the genome.
1 2 | windowCounts(bam.files, spacing=50, width=spacing, ext=100, shift=0,
filter=10, bin=FALSE, param=readParam(), BPPARAM=SerialParam())
|
bam.files |
A character vector containing paths to sorted and indexed BAM files. Alternatively, a list of BamFile objects. |
spacing |
An integer scalar specifying the distance between consecutive windows. |
width |
An integer scalar specifying the width of the window. |
ext |
An integer scalar or a list of two integer scalars/vectors, containing the average length(s) of the sequenced fragments in each library. |
shift |
An integer scalar specifying how much the start of each window should be shifted to the left. |
filter |
An integer scalar for the minimum count sum across libraries for each window. |
bin |
A logical scalar indicating whether binning should be performed. |
param |
A readParam object containing read extraction parameters. |
BPPARAM |
A BiocParallelParam specifying how parallelization is to be performed across files. |
A RangedSummarizedExperiment object is returned containing one integer matrix.
Each entry of the matrix contains the count for each library (column) at each window (row).
The coordinates of each window are stored as the rowRanges
.
The total number of reads in each library are stored as totals
in the colData
, along with the read (rlen
) and extension lengths (ext
) for each library.
Other window counting parameters (e.g., spacing
, width
, param
) are stored in the metadata
.
A window is defined as a genomic interval of size equal to width
.
The value of width
can be interpreted as the width of the contact area between the DNA and protein.
In practical terms, it determines the spatial resolution of the analysis. Larger windows count reads over a larger region which results in larger counts.
This results in greater detection power at the cost of resolution.
By default, the first window on a chromosome starts at base position 1.
This can be shifted to the left by specifying an appropriate value for shift
.
New windows are defined by sliding the current window to the right by the specified spacing
.
Increasing spacing
will reduce the frequency at which counts are extracted from the genome.
This results in some loss of resolution but it may be necessary when machine memory is limited.
If bin
is set, settings are internally adjusted so that all reads are counted into non-overlapping adjacent bins of size width
.
Specifically, spacing
is set to width
and filter
is capped at a maximum value of 1 (empty bins can be retained with filter=0
).
Only the 5' end of each read or the midpoint of each fragment (for paired-end data) is used in counting.
Read extraction from the BAM files is governed by the param
argument.
This specifies whether reads are to be read in single- or paired-end mode, whether to apply a threshold to the mapping quality, and so on – see ?readParam
for details.
The strandedness of the output rowRanges
is set based on the strand(s) from which the reads are extracted and counted.
This is determined by the value of the forward
slot in the input param
object.
Fragments are inferred from reads by directional extension in single-end data (see below)
or by identifying proper pairs in paired-end data (see readParam
and getPESizes
for more details).
The number of fragments overlapping the window for each library is then counted for each window position.
Windows will be removed if the count sum across all libraries is below filter
.
This reduces the memory footprint of the output by not returning empty or near-empty windows, which are usually uninteresting anyway.
For single-end reads, directional extension is performed whereby each read is extended from its 3' end to the average fragment length, i.e., ext
.
This obtains a rough estimate of the interval of the fragment from which the read was derived.
It is particularly useful for TF data, where extension specifically increases the coverage of peaks that exhibit strand bimodality.
No extension is performed if ext
is set to NA
, such that the read length is used as the fragment length in that library.
If libraries have different fragment lengths, this can be accommodated by supplying a list of 2 elements to ext
.
The first element (named init.ext
here, for convenience) should be an integer vector specifying the extension length for each library.
The second element (final.ext
) should be an integer scalar specifying the final fragment length.
All reads are directionally extended by init.ext
, and the resulting fragment is resized to final.ext
by shrinking or expanding from the fragment midpoint.
For a bimodal peak, scaling effectively aligns the subpeaks on a given strand across all libraries to a common location.
This removes the most obvious differences in widths.
If any element of init.ext
is NA
, no extension is performed for the corresponding library.
If final.ext
is set to NA
, no rescaling is performed from the library-specific fragment lengths.
Values of init.ext
are stored as the ext
field in the colData
of the output object, while final.ext
is stored in the metadata
.
Directional extension is not performed for paired-end data, so the values in ext
are not used directly.
Hwoever, rescaling can still be performed to standardize fragment lengths across libraries by resizing each fragment from its midpoint.
This will use the second element of ext
as final.ext
, if ext
is specified as a list of length 2.
On a similar note, some downstream functions will use the extension length in the output colData
as the average fragment length.
Thus, to maintain compatibility, the ext
field in colData
is set to the average of the inferred fragment lengths for valid read pairs.
These values will not be used in windowCounts
, but instead, in functions like getWidths
.
Aaron Lun
correlateReads
,
readParam
,
getPESizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # A low filter is only used here as the examples have very few reads.
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
windowCounts(bamFiles, filter=1)
windowCounts(bamFiles, width=100, filter=1)
# Multiple extension lengths.
windowCounts(bamFiles, ext=list(c(50, 100), NA), filter=1)
windowCounts(bamFiles, ext=list(c(50, 100), 80), filter=1)
# Loading PE data.
bamFile <- system.file("exdata", "pet.bam", package="csaw")
windowCounts(bamFile, param=readParam(pe="both"), filter=1)
windowCounts(bamFile, param=readParam(pe="first"), filter=1)
windowCounts(bamFile, param=readParam(max.frag=100, pe="both"), filter=1)
windowCounts(bamFile, param=readParam(max.frag=100, pe="both", restrict="chrA"), filter=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.