cpm: Counts per Million or Reads per Kilobase per Million

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/cpm.R

Description

Computes counts per million (CPM) or reads per kilobase per million (RPKM) values.

Usage

1
2
3
4
5
6
7
8
## S3 method for class 'DGEList'
cpm(x, normalized.lib.sizes=TRUE, log=FALSE, prior.count=0.25, ...)
## Default S3 method:
cpm(x, lib.size=NULL, log=FALSE, prior.count=0.25, ...)
## S3 method for class 'DGEList'
rpkm(x, gene.length=NULL, normalized.lib.sizes=TRUE, log=FALSE, prior.count=0.25, ...)
## Default S3 method:
rpkm(x, gene.length, lib.size=NULL, log=FALSE, prior.count=0.25, ...)

Arguments

x

matrix of counts or a DGEList object

normalized.lib.sizes

logical, use normalized library sizes?

lib.size

library size, defaults to colSums(x).

log

logical, if TRUE then log2 values are returned.

prior.count

average count to be added to each observation to avoid taking log of zero. Used only if log=TRUE.

gene.length

vector of length nrow(x) giving gene length in bases, or the name of the column x$genes containing the gene lengths.

...

other arguments that are not currently used.

Details

CPM or RPKM values are useful descriptive measures for the expression level of a gene. By default, the normalized library sizes are used in the computation for DGEList objects but simple column sums for matrices.

If log-values are computed, then a small count, given by prior.count but scaled to be proportional to the library size, is added to x to avoid taking the log of zero.

The rpkm method for DGEList objects will try to find the gene lengths in a column of x$genes called Length or length. Failing that, it will look for any column name containing "length" in any capitalization.

Value

numeric matrix of CPM or RPKM values.

Note

aveLogCPM(x), rowMeans(cpm(x,log=TRUE)) and log2(rowMeans(cpm(x)) all give slightly different results.

Author(s)

Davis McCarthy, Gordon Smyth

See Also

aveLogCPM

Examples

1
2
3
4
5
6
7
8
9
y <- matrix(rnbinom(20,size=1,mu=10),5,4)
cpm(y)

d <- DGEList(counts=y, lib.size=1001:1004)
cpm(d)
cpm(d,log=TRUE)

d$genes$Length <- c(1000,2000,500,1500,3000)
rpkm(d)

Example output

Loading required package: limma
          [,1]     [,2]      [,3]     [,4]
[1,]  86956.52 306122.4  15873.02 333333.3
[2,] 434782.61 306122.4 460317.46 230769.2
[3,]  86956.52 102040.8 444444.44 282051.3
[4,] 260869.57 163265.3  79365.08      0.0
[5,] 130434.78 122449.0      0.00 153846.2
   Sample1   Sample2   Sample3   Sample4
1 1998.002 14970.060   997.009 12948.207
2 9990.010 14970.060 28913.260  8964.143
3 1998.002  4990.020 27916.251 10956.175
4 5994.006  7984.032  4985.045     0.000
5 2997.003  5988.024     0.000  5976.096
   Sample1  Sample2   Sample3   Sample4
1 11.13331 13.89291 10.282815 13.687267
2 13.32112 13.89291 14.831114 13.168817
3 11.13331 12.35447 14.780929 13.451207
4 12.60739 13.00655 12.353095  7.961463
5 11.66390 12.60601  7.961463 12.603248
   Sample1   Sample2   Sample3   Sample4
1 1998.002 14970.060   997.009 12948.207
2 4995.005  7485.030 14456.630  4482.072
3 3996.004  9980.040 55832.502 21912.351
4 3996.004  5322.688  3323.363     0.000
5  999.001  1996.008     0.000  1992.032

edgeR documentation built on May 31, 2017, 11:02 a.m.