Description Usage Arguments Details Value Comments on strand specificity Author(s) See Also Examples
Get the coverage profile around potential binding sites.
1 2 3 | profileSites(bam.files, regions, param=readParam(), range=5000, ext=100,
average=TRUE, normalize="none", strand=c("ignore", "use", "match"),
BPPARAM=SerialParam())
|
bam.files |
A character vector containing paths to one or more BAM files. Alternatively, a list of BamFile objects. |
regions |
A GenomicRanges object over which profiles are to be aggregated. |
param |
A readParam object containing read extraction parameters. |
range |
An integer scalar specifying the range over which the profile will be collected. |
ext |
An integer scalar or list specifying the average fragment length for single-end data. |
average |
A logical scalar specifying whether the profiles should be averaged across regions. |
normalize |
A string specifying how normalization of each region's profile should be performed prior to averaging. |
strand |
A string indicating how stranded |
BPPARAM |
A BiocParallelParam specifying how parallelization is to be performed across files. |
This function computes the average coverage profile around the specified regions.
Specifically, the profile is constructed by counting the number of fragments overlapping each base in the interval flanking each entry of regions
.
The interval for each entry is centred at its start location (base zero) and spans the flanking range
on either side.
Single-end reads are directionally extended to ext
to impute the fragment (see windowCounts
for more details).
For paired-end reads, the interval between each pair is used as the fragment.
If multiple bam.files
are specified, reads are pooled across files for counting into each profile.
By default, an average of the coverage profiles across all regions
is returned.
Normalization of each region's profile is performed on by setting normalize
to:
none
:No normalization is performed, i.e., counts per base are directly averaged across all regions. Thus, the shape of the average profile is largely determined by high-abundance regions.
total
:The profile for each region is divided by the sum of coverages across all bases in the interval. This effectively normalizes for the total number of reads in each region.
max
:The profile for each region is divided by its maximum value. This ensures that the maximum height of each region is the same.
If average=FALSE
, a separate profile will be returned for each region instead.
This may be useful, e.g., for constructing heatmaps of enrichment across many regions.
The profile can be used to examine average coverage around known features of interest, like genes or transcription start sites.
Its shape can guide the choice of the window size in windowCounts
, or to determine if larger regions should be used in regionCounts
.
For the former, restricting the regions
to locally maximal windows with findMaxima
is recommended to capture the profile of binding events.
If average=TRUE
, a numeric vector of average coverages for each base position within range
is returned, where the average is taken over all regions
.
The vector is named according to the relative position of each base to the start of the region.
The interpretation of the coverages will depend on the value of normalize
.
If average=FALSE
, an integer matrix of coverage values is returned.
Each row of the matrix corresponds to an entry in regions
, while each column corresponds to a base position with range
.
Column names are set to the relative position of each base to the start of each region.
By default, the strandedness of the regions are ignored with strand="ignore"
.
If strand="use"
, the behaviour of this function will differ between forward- and reverse-stranded entries in regions
.
Forward-stranded or unstranded regions are processed as previously described above. Base zero corresponds to the start of the region, negative distances correspond to the 5' flanking region, and positive distances correspond to the 3' flanking region.
Reverse-stranded regions are flipped, i.e., base zero corresponds to the end of the region. Negative distances correspond to the 5' flanking region on the reverse strand, while positive distances correspond to the 3' flanking region on this strand.
This ensures that the center of the profile always corresponds to the 5' end of the region, with upstream regions on the left and downstream regions on the right. This may be useful for features where strandedness is important, e.g., TSS's.
By default, the strandedness of the region has no effect on the choice of reads that are used.
If strand="match"
, the profile for reverse-strand regions is constructed from reverse-strand reads only.
Similarly, only forward-strand reads are used for forward- or unstranded regions.
Note that param$forward
must be set to NULL
for this to work.
Flipping of reverse-stranded profiles is also performed in this setting, as described for strand="use"
.
Aaron Lun
findMaxima
,
windowCounts
,
wwhm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | bamFile <- system.file("exdata", "rep1.bam", package="csaw")
data <- windowCounts(bamFile, filter=1)
rwsms <- rowSums(assay(data))
maxed <- findMaxima(rowRanges(data), range=100, metric=rwsms)
# Running profileSites .
x <- profileSites(bamFile, rowRanges(data)[maxed], range=200)
plot(as.integer(names(x)), x)
x <- profileSites(bamFile, rowRanges(data)[maxed], range=500)
plot(as.integer(names(x)), x)
# Introducing some strandedness.
regs <- rowRanges(data)[maxed]
strand(regs) <- sample(c("-", "+", "*"), sum(maxed), replace=TRUE)
x <- profileSites(bamFile, regs, range=500)
plot(as.integer(names(x)), x)
x2 <- profileSites(bamFile, regs, range=500, strand="use")
points(as.integer(names(x2)), x2, col="red")
x3 <- profileSites(bamFile, regs, range=500, strand="match",
param=readParam(forward=NULL))
points(as.integer(names(x3)), x3, col="blue")
# Returning separate profiles.
y <- profileSites(bamFile, rowRanges(data)[maxed], range=500, average=FALSE)
dim(y)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.