cpm: Counts per Million or Reads per Kilobase per Million
In edgeR: Empirical Analysis of Digital Gene Expression Data in R

Description Usage Arguments Details Value Note Author(s) See Also Examples

Compute counts per million (CPM) or reads per kilobase per million (RPKM).

## S3 method for class 'DGEList'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'SummarizedExperiment'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEGLM'
cpm(y, log = FALSE, shrunk = TRUE, ...)
## Default S3 method:
cpm(y, lib.size = NULL, offset=NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'SummarizedExperiment'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEGLM'
rpkm(y, gene.length, log = FALSE, shrunk = TRUE, ...)
## Default S3 method:
rpkm(y, gene.length, lib.size = NULL, offset=NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## S3 method for class 'SummarizedExperiment'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## Default S3 method:
cpmByGroup(y, group = NULL, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## S3 method for class 'SummarizedExperiment'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## Default S3 method:
rpkmByGroup(y, group = NULL, gene.length, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)

`y`	a matrix-like object containing counts. Can be a numeric matrix, a `DGEList` object, a `SummarizedExperiment` object with a `"counts"` assay, or any object that can be coerced to a matrix by `as.matrix`. For `cpm` and `rpkm`, it can also be a `DGEGLM` or `DGELRT` object.
`normalized.lib.sizes`	logical, use normalized library sizes?
`lib.size`	library size, defaults to `colSums(y)`. Ignored if `offset` is specified.
`offset`	numeric matrix of same size as `y`, or a vector of length `ncol(y)`, representing library sizes on the log scale. Can also be a scalar for `cpmByGroup.default` and `rpkmByGroup.default`. If specified, then takes precedence over `lib.size`.
`log`	logical, if `TRUE` then `log2` values are returned.
`prior.count`	average count to be added to each observation to avoid taking log of zero. Used only if `log=TRUE`.
`shrunk`	logical, if `TRUE` then the usual coefficients from the fitted object will be used, if `FALSE` then the unshrunk coefficients will be used.
`gene.length`	vector of length `nrow(y)` giving gene length in bases, or the name of the column `y$genes` containing the gene lengths.
`group`	factor giving group membership for columns of `y`. Defaults to `y$sample$group` for the `DGEList` method and to a single level factor for the default method.
`dispersion`	numeric vector of negative binomial dispersions.
`weights`	numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as `y`.
`...`	other arguments are not used.

CPM or RPKM values are useful descriptive measures for the expression level of a gene. By default, the normalized library sizes are used in the computation for DGEList objects but simple column sums for matrices.

If log-values are computed, then a small count, given by prior.count but scaled to be proportional to the library size, is added to y to avoid taking the log of zero.

The rpkm methods for DGEList, DGEGLM or DGELRT objects will try to find the gene lengths in a column of y$genes called Length or length. Failing that, it will look for any column name containing "length" in any capitalization.

The cpm and rpkm methods for DGEGLM and DGELRT fitted model objects return fitted CPM or RPKM values. If shrunk=TRUE, then the CPM or RPKM values will reflect the prior.count input to the original linear model fit. If shrunk=FALSE, then the CPM or RPKM values will be computed with prior.count=0. Note that the latter could result in taking the log of near-zero values if log=TRUE.

cpmByGroup and rpkmByGroup compute group average values on the unlogged scale.

A numeric matrix of CPM or RPKM values, on the log2 scale if log=TRUE. cpm and rpkm produce matrices of the same size as y. If y was a data object, then observed values are returned. If y was a fitted model object, then fitted values are returned.

cpmByGroup and rpkmByGroup produce matrices with a column for each level of group.

aveLogCPM(y), rowMeans(cpm(y,log=TRUE)) and log2(rowMeans(cpm(y)) all give slightly different results.

Davis McCarthy, Gordon Smyth, Yunshun Chen, Aaron Lun

aveLogCPM

y <- matrix(rnbinom(20,size=1,mu=10),5,4)
cpm(y)

d <- DGEList(counts=y, lib.size=1001:1004)
cpm(d)
cpm(d,log=TRUE)

d$genes <- data.frame(Length=c(1000,2000,500,1500,3000))
rpkm(d)

cpmByGroup(d, group=c(1,1,2,2))

rpkmByGroup(d, group=c(1,1,2,2))

Loading required package: limma
          [,1]      [,2]      [,3]     [,4]
[1,] 379310.34 478260.87 121951.22 102564.1
[2,]  34482.76 362318.84 341463.41 128205.1
[3,] 413793.10  86956.52 170731.71 102564.1
[4,] 172413.79  72463.77  24390.24 487179.5
[5,]      0.00      0.00 341463.41 179487.2
    Sample1   Sample2   Sample3   Sample4
1 10989.011 32934.132  4985.045  3984.064
2   999.001 24950.100 13958.126  4980.080
3 11988.012  5988.024  6979.063  3984.064
4  4995.005  4990.020   997.009 18924.303
5     0.000     0.000 13958.126  6972.112
   Sample1  Sample2  Sample3  Sample4
1 13.65870 15.08640 12.76328 12.53996
2 11.54212 14.71199 13.95581 12.76225
3 13.76564 12.95698 13.12580 12.53996
4 12.76534 12.76431 11.54116 14.34680
5 10.95644 10.95644 13.95581 13.12468
     Sample1  Sample2    Sample3   Sample4
1 10989.0110 32934.13  4985.0449  3984.064
2   499.5005 12475.05  6979.0628  2490.040
3 23976.0240 11976.05 13958.1256  7968.127
4  3330.0033  3326.68   664.6726 12616.202
5     0.0000     0.00  4652.7085  2324.037
          1         2
1 21964.180  4484.351
2 12978.174  9467.586
3  8986.985  5480.978
4  4992.511  9963.633
5     0.000 10463.977
          1         2
1 21964.180  4484.351
2  6489.087  4733.793
3 17973.970 10961.956
4  3328.341  6642.422
5     0.000  3487.992