cpm: Counts per Million or Reads per Kilobase per Million
In hiraksarkar/edgeR_fork: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Note Author(s) See Also Examples

Compute counts per million (CPM) or reads per kilobase per million (RPKM).

## S3 method for class 'DGEList'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'SummarizedExperiment'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEGLM'
cpm(y, log = FALSE, shrunk = TRUE, ...)
## Default S3 method:
cpm(y, lib.size = NULL, offset=NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'SummarizedExperiment'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEGLM'
rpkm(y, gene.length, log = FALSE, shrunk = TRUE, ...)
## Default S3 method:
rpkm(y, gene.length, lib.size = NULL, offset=NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## S3 method for class 'SummarizedExperiment'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## Default S3 method:
cpmByGroup(y, group = NULL, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## S3 method for class 'SummarizedExperiment'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## Default S3 method:
rpkmByGroup(y, group = NULL, gene.length, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)

`y`	a matrix-like object containing counts. Can be a numeric matrix, a `DGEList` object, a `SummarizedExperiment` object with a `"counts"` assay, or any object that can be coerced to a matrix by `as.matrix`. For `cpm` and `rpkm`, it can also be a `DGEGLM` or `DGELRT` object.
`normalized.lib.sizes`	logical, use normalized library sizes?
`lib.size`	library size, defaults to `colSums(y)`. Ignored if `offset` is specified.
`offset`	numeric matrix of same size as `y`, or a vector of length `ncol(y)`, representing library sizes on the log scale. Can also be a scalar for `cpmByGroup.default` and `rpkmByGroup.default`. If specified, then takes precedence over `lib.size`.
`log`	logical, if `TRUE` then `log2` values are returned.
`prior.count`	average count to be added to each observation to avoid taking log of zero. Used only if `log=TRUE`.
`shrunk`	logical, if `TRUE` then the usual coefficients from the fitted object will be used, if `FALSE` then the unshrunk coefficients will be used.
`gene.length`	vector of length `nrow(y)` giving gene length in bases, or the name of the column `y$genes` containing the gene lengths.
`group`	factor giving group membership for columns of `y`. Defaults to `y$sample$group` for the `DGEList` method and to a single level factor for the default method.
`dispersion`	numeric vector of negative binomial dispersions.
`weights`	numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as `y`.
`...`	other arguments are not used.

CPM or RPKM values are useful descriptive measures for the expression level of a gene. By default, the normalized library sizes are used in the computation for DGEList objects but simple column sums for matrices.

If log-values are computed, then a small count, given by prior.count but scaled to be proportional to the library size, is added to y to avoid taking the log of zero.

The rpkm methods for DGEList, DGEGLM or DGELRT objects will try to find the gene lengths in a column of y$genes called Length or length. Failing that, it will look for any column name containing "length" in any capitalization.

The cpm and rpkm methods for DGEGLM and DGELRT fitted model objects return fitted CPM or RPKM values. If shrunk=TRUE, then the CPM or RPKM values will reflect the prior.count input to the original linear model fit. If shrunk=FALSE, then the CPM or RPKM values will be computed with prior.count=0. Note that the latter could result in taking the log of near-zero values if log=TRUE.

cpmByGroup and rpkmByGroup compute group average values on the unlogged scale.

A numeric matrix of CPM or RPKM values, on the log2 scale if log=TRUE. cpm and rpkm produce matrices of the same size as y. If y was a data object, then observed values are returned. If y was a fitted model object, then fitted values are returned.

cpmByGroup and rpkmByGroup produce matrices with a column for each level of group.

aveLogCPM(y), rowMeans(cpm(y,log=TRUE)) and log2(rowMeans(cpm(y)) all give slightly different results.

Davis McCarthy, Gordon Smyth, Yunshun Chen, Aaron Lun

aveLogCPM

y <- matrix(rnbinom(20,size=1,mu=10),5,4)
cpm(y)

d <- DGEList(counts=y, lib.size=1001:1004)
cpm(d)
cpm(d,log=TRUE)

d$genes <- data.frame(Length=c(1000,2000,500,1500,3000))
rpkm(d)

cpmByGroup(d, group=c(1,1,2,2))

rpkmByGroup(d, group=c(1,1,2,2))