addPriorCount: Add a prior count

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/addPriorCount.R

Description

Add a library size-adjusted prior count to each observation.

Usage

1
addPriorCount(y, lib.size=NULL, offset=NULL, prior.count=1)

Arguments

y

a numeric count matrix, with rows corresponding to genes and columns to libraries.

lib.size

a numeric vector of library sizes.

offset

a numeric vector or matrix of offsets.

prior.count

a numeric scalar or vector of prior counts to be added to each gene.

Details

This function adds a positive prior count to each observation, often useful for avoiding zeroes during calculation of log-values. For example, predFC will call this function to calculate shrunken log-fold changes. aveLogCPM and cpm also use the same underlying code to calculate (average) log-counts per million.

The actual value added to the counts for each library is scaled according to the library size. This ensures that the relative contribution of the prior is the same for each library. Otherwise, a fixed prior would have little effect on a large library, but a big effect for a small library.

The library sizes are also modified, with twice the scaled prior being added to the library size for each library. To understand the motivation for this, consider that each observation is, effectively, a proportion of the total count in the library. The addition scheme implemented here represents an empirical logistic transform and ensures that the proportion can never be zero or one.

If offset is supplied, this is used in favour of lib.size where exp(offset) is defined as the vector/matrix of library sizes. If an offset matrix is supplied, this will lead to gene-specific scaling of the prior as described above.

Most use cases of this function will involve supplying a constant value to prior.count for all genes. However, it is also possible to use gene-specific values by supplying a vector of length equal to the number of rows in y.

Value

A list is returned containing y, a matrix of counts with the added priors; and offset, a CompressedMatrix containing the (log-transformed) modified library sizes.

Author(s)

Aaron Lun

See Also

aveLogCPM, cpm, predFC

Examples

1
2
3
4
5
6
original <- matrix(rnbinom(1000, mu=20, size=10), nrow=200)
head(original)

out <- addPriorCount(original)
head(out$y)
head(out$offset)

Example output

Loading required package: limma
     [,1] [,2] [,3] [,4] [,5]
[1,]   26   24   17   30   21
[2,]   25   17   21   21   20
[3,]   14   23   13   27   25
[4,]   25    8   14    8   24
[5,]   24   24   35   31   13
[6,]   23   10   15   15   25
         [,1]      [,2]     [,3]      [,4]     [,5]
[1,] 27.02265 24.989835 17.98095 30.988108 22.01845
[2,] 26.02265 17.989835 21.98095 21.988108 21.01845
[3,] 15.02265 23.989835 13.98095 27.988108 26.01845
[4,] 26.02265  8.989835 14.98095  8.988108 25.01845
[5,] 25.02265 24.989835 35.98095 31.988108 14.01845
[6,] 24.02265 10.989835 15.98095 15.988108 26.01845
      [,1]     [,2]     [,3]     [,4]     [,5]
x 8.330151 8.297538 8.288525 8.295792 8.326042
attr(,"repeat.row")
[1] TRUE
attr(,"repeat.col")
[1] FALSE
attr(,"class")
[1] "compressedMatrix"

edgeR documentation built on Jan. 16, 2021, 2:03 a.m.