scaledAverage: Scaled average abundance
In csaw: ChIP-Seq Analysis with Windows

Description Usage Arguments Details Value Author(s) See Also Examples

Compute the scaled average abundance for each feature.

1	scaledAverage(y, scale=1, prior.count=NULL, dispersion=NULL, assay.id="counts")

`y`	A SummarizedExperiment object containing a count matrix.
`scale`	a numeric scalar vector indicating the magnitude with which each abundance is to be downscaled
`prior.count`	a numeric scalar specifying the prior count to add
`dispersion`	a numeric scalar or vector specifying the dispersion for GLM fitting.
`assay.id`	A string or integer scalar indicating which assay of `y` contains the counts.

This function computes the average abundance of each feature in y, and downscales it according to scale. For example, if scale=2, the average count is halved, i.e., the returned abundances are decreased by 1 (as they are log2-transformed values). The aim is to set scale based on the relative width of regions, to allow abundances to be compared between regions of different size. Widths can be obtained using the getWidths function.

This function mimics the behaviour of aveLogCPM but handles the prior.count with some subtlety. Specifically, it scales up the prior count by scale before adding it to the counts. This ensures that the “effective” prior is the same after the abundance is scaled down. Otherwise, the use of the same prior would incorrectly result in a smaller abundance for larger regions, regardless of the read density.

An additional difference from aveLogCPM is that the prior count is not scaled up before being added to the library sizes/offsets. (See addPriorCount for more details.) This ensures that the modified offsets do not depend on scale, which allows abundances to be compared between regions of differing size. Otherwise, larger regions with greater scale would always have (slightly) larger modified offsets and lower abundances than appropriate.

Note that the adjustment for width assumes that reads are uniformly distributed throughout each region. This is reasonable for most background regions, but may not be for enriched regions. When the distribution is highly heterogeneous, the downscaled abundance of a large region will not be an accurate representation of the abundance of the smaller regions nested within.

For consistency, the prior.count is set to the default value of aveLogCPM.DGEList, if it is not otherwise specified. If a non-default value is used, make sure that it is the same for all calls to scaledAverage. This ensures that comparisons between the returned values are valid.

A numeric vector of scaled abundances, with one entry for each row of y.

Aaron Lun

getWidths, aveLogCPM, addPriorCount

bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
size1 <- 50
data1 <- windowCounts(bamFiles, width=size1, filter=1)
size2 <- 500
data2 <- windowCounts(bamFiles, width=size2, filter=1)

# Adjusting by `scale`, based on median sizes.
head(scaledAverage(data1))
relative <- median(getWidths(data2))/median(getWidths(data1))
head(scaledAverage(data2, scale=relative))

# Need to make sure the same prior is used, if non-default.
pc <- 5
head(scaledAverage(data1, prior.count=pc))
head(scaledAverage(data2, scale=relative, prior.count=pc))

# Different way to compute sizes, for 1-to-1 relations.
data3 <- regionCounts(bamFiles, regions=resize(rowRanges(data1),   
    fix="center", width=size2))
head(scaledAverage(data1))
relative.2 <- getWidths(data1)/getWidths(data2)
head(scaledAverage(data3), scale=relative.2)