lcpm: Transform Counts

Description Usage Arguments Details Value References See Also Examples

Description

This convenient wrapper function converts raw counts to the log2-counts per million scale, following normalization for library size and a minimal count shift.

Usage

1
lcpm(mat, filter = NULL, method = "TMM")

Arguments

mat

Probe by sample matrix of raw counts.

filter

Optional vector of length 2 specifying the filter criterion. Each probe must have at least filter[1] log2-counts per million in at least filter[2] libraries to pass the expression threshold.

method

Normalization method to be used. See Details.

Details

lcpm applies the voom transformation to sequencing data, converting counts to approximately normal distributions on the log2-CPM scale. Data can now be modelled using traditional linear techniques (at least once spot weights have been applied; see Law, et al. (2014)), or used for unsupervised clustering analysis via PCA, MDS, or other methods.

It is recommended that low count genes be filtered out prior to transformation. There is no general algorithm for determining the most appropriate expression filter for a given data set. As a rule of thumb, the limma authors advise setting filter[1] to either 1, or 10 / (L / 1,000,000), where L = the minimum library size for a given count matrix. The former corresponds to a log2-CPM of 0, while the latter may be preferable in cases where read depth is especially shallow. For filter[2], the authors recommend using the number of replicates in the largest group, to guarantee that a gene is expresed in at least one sample for any groupwise comparison. These are broad guidelines, however, not strict rules.

method = "TMM" is the weighted trimmed mean of M-values (to the reference) proposed by Robinson & Oshlack (2010), where the weights are from the delta method on binomial data.

method = "RLE" is the scaling factor method proposed by Anders & Huber (2010). We call it "relative log expression", as median library is calculated from the geometric mean of all columns and the median ratio of each sample to the median library is taken as the scale factor.

method = "upperquartile" is the upper-quartile normalization method of Bullard, et al. (2010), in which the scale factors are calculated from the 75 quantile of the counts for each library, after removing genes which are zero in all libraries.

If method = "none", then the normalization factors are set to 1.

For symmetry, normalization factors are adjusted to multiply to 1. The effective library size is then the original library size multiplied by the scaling factor.

Note that rows that have zero counts for all columns are trimmed before normalization factors are computed. Therefore rows with all zero counts do not affect the estimated factors.

Value

A numeric matrix of normalized counts on the log2-CPM scale.

References

Law, C.W., Chen, Y., Shi, W., & Smyth, G.K. (2014). "voom: precision weights unlock linear model analysis tools for RNA-seq read counts." Genome Biology, 15:R29. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29

Anders, S. & Huber, W. (2010). "Differential expression analysis for sequence count data." Genome Biology, 11:R106. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-10-r106

Bullard, J.H., Purdom, E., Hansen, K.D. & Dudoit, S. (2010). "Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments." BMC Bioinformatics, 11:94. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-94

Robinson, M.D. & Oshlack, A. (2010). "A scaling normalization method for differential expression analysis of RNA-seq data." Genome Biology, 11:R25. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25

See Also

voom, cpm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Simulate count data
mat <- matrix(rnbinom(5000, mu = 4, size = 1), nrow = 1000, ncol = 5)

# Plot raw counts
library(bioplotr)
plot_density(mat)

# Plot transformed counts
trans_mat <- lcpm(mat)
plot_density(trans_mat)

dswatson/biomisc documentation built on May 15, 2019, 4:52 p.m.