View source: R/scaledAverage.R
scaledAverage | R Documentation |
Compute the scaled average abundance for each feature.
scaledAverage(y, scale=1, prior.count=NULL, dispersion=NULL, assay.id="counts")
y |
A SummarizedExperiment object containing a count matrix. |
scale |
a numeric scalar vector indicating the magnitude with which each abundance is to be downscaled |
prior.count |
a numeric scalar specifying the prior count to add |
dispersion |
a numeric scalar or vector specifying the dispersion for GLM fitting. |
assay.id |
A string or integer scalar indicating which assay of |
This function computes the average abundance of each feature in y
, and downscales it according to scale
.
For example, if scale=2
, the average count is halved, i.e., the returned abundances are decreased by 1 (as they are log2-transformed values).
The aim is to set scale
based on the relative width of regions, to allow abundances to be compared between regions of different size.
Widths can be obtained using the getWidths
function.
This function mimics the behaviour of aveLogCPM
but handles the prior.count
with some subtlety.
Specifically, it scales up the prior count by scale
before adding it to the counts.
This ensures that the “effective” prior is the same after the abundance is scaled down.
Otherwise, the use of the same prior would incorrectly result in a smaller abundance for larger regions, regardless of the read density.
An additional difference from aveLogCPM
is that the prior count is not scaled up before being added to the library sizes/offsets.
(See addPriorCount
for more details.)
This ensures that the modified offsets do not depend on scale
, which allows abundances to be compared between regions of differing size.
Otherwise, larger regions with greater scale
would always have (slightly) larger modified offsets and lower abundances than appropriate.
Note that the adjustment for width assumes that reads are uniformly distributed throughout each region. This is reasonable for most background regions, but may not be for enriched regions. When the distribution is highly heterogeneous, the downscaled abundance of a large region will not be an accurate representation of the abundance of the smaller regions nested within.
For consistency, the prior.count
is set to the default value of aveLogCPM.DGEList
, if it is not otherwise specified.
If a non-default value is used, make sure that it is the same for all calls to scaledAverage
.
This ensures that comparisons between the returned values are valid.
A numeric vector of scaled abundances, with one entry for each row of y
.
Aaron Lun
getWidths
,
aveLogCPM
,
addPriorCount
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
size1 <- 50
data1 <- windowCounts(bamFiles, width=size1, filter=1)
size2 <- 500
data2 <- windowCounts(bamFiles, width=size2, filter=1)
# Adjusting by `scale`, based on median sizes.
head(scaledAverage(data1))
relative <- median(getWidths(data2))/median(getWidths(data1))
head(scaledAverage(data2, scale=relative))
# Need to make sure the same prior is used, if non-default.
pc <- 5
head(scaledAverage(data1, prior.count=pc))
head(scaledAverage(data2, scale=relative, prior.count=pc))
# Different way to compute sizes, for 1-to-1 relations.
data3 <- regionCounts(bamFiles, regions=resize(rowRanges(data1),
fix="center", width=size2))
head(scaledAverage(data1))
relative.2 <- getWidths(data1)/getWidths(data2)
head(scaledAverage(data3), scale=relative.2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.