cpm: Counts per Million or Reads per Kilobase per Million

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/cpm.R

Description

Compute counts per million (CPM) or reads per kilobase per million (RPKM).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## S3 method for class 'DGEList'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'SummarizedExperiment'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEGLM'
cpm(y, log = FALSE, shrunk = TRUE, ...)
## Default S3 method:
cpm(y, lib.size = NULL, offset=NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'SummarizedExperiment'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEGLM'
rpkm(y, gene.length, log = FALSE, shrunk = TRUE, ...)
## Default S3 method:
rpkm(y, gene.length, lib.size = NULL, offset=NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## S3 method for class 'SummarizedExperiment'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## Default S3 method:
cpmByGroup(y, group = NULL, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## S3 method for class 'SummarizedExperiment'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## Default S3 method:
rpkmByGroup(y, group = NULL, gene.length, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)

Arguments

y

a matrix-like object containing counts. Can be a numeric matrix, a DGEList object, a SummarizedExperiment object with a "counts" assay, or any object that can be coerced to a matrix by as.matrix. For cpm and rpkm, it can also be a DGEGLM or DGELRT object.

normalized.lib.sizes

logical, use normalized library sizes?

lib.size

library size, defaults to colSums(y). Ignored if offset is specified.

offset

numeric matrix of same size as y, or a vector of length ncol(y), representing library sizes on the log scale. Can also be a scalar for cpmByGroup.default and rpkmByGroup.default. If specified, then takes precedence over lib.size.

log

logical, if TRUE then log2 values are returned.

prior.count

average count to be added to each observation to avoid taking log of zero. Used only if log=TRUE.

shrunk

logical, if TRUE then the usual coefficients from the fitted object will be used, if FALSE then the unshrunk coefficients will be used.

gene.length

vector of length nrow(y) giving gene length in bases, or the name of the column y$genes containing the gene lengths.

group

factor giving group membership for columns of y. Defaults to y$sample$group for the DGEList method and to a single level factor for the default method.

dispersion

numeric vector of negative binomial dispersions.

weights

numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.

...

other arguments are not used.

Details

CPM or RPKM values are useful descriptive measures for the expression level of a gene. By default, the normalized library sizes are used in the computation for DGEList objects but simple column sums for matrices.

If log-values are computed, then a small count, given by prior.count but scaled to be proportional to the library size, is added to y to avoid taking the log of zero.

The rpkm methods for DGEList, DGEGLM or DGELRT objects will try to find the gene lengths in a column of y$genes called Length or length. Failing that, it will look for any column name containing "length" in any capitalization.

The cpm and rpkm methods for DGEGLM and DGELRT fitted model objects return fitted CPM or RPKM values. If shrunk=TRUE, then the CPM or RPKM values will reflect the prior.count input to the original linear model fit. If shrunk=FALSE, then the CPM or RPKM values will be computed with prior.count=0. Note that the latter could result in taking the log of near-zero values if log=TRUE.

cpmByGroup and rpkmByGroup compute group average values on the unlogged scale.

Value

A numeric matrix of CPM or RPKM values, on the log2 scale if log=TRUE. cpm and rpkm produce matrices of the same size as y. If y was a data object, then observed values are returned. If y was a fitted model object, then fitted values are returned.

cpmByGroup and rpkmByGroup produce matrices with a column for each level of group.

Note

aveLogCPM(y), rowMeans(cpm(y,log=TRUE)) and log2(rowMeans(cpm(y)) all give slightly different results.

Author(s)

Davis McCarthy, Gordon Smyth, Yunshun Chen, Aaron Lun

See Also

aveLogCPM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
y <- matrix(rnbinom(20,size=1,mu=10),5,4)
cpm(y)

d <- DGEList(counts=y, lib.size=1001:1004)
cpm(d)
cpm(d,log=TRUE)

d$genes <- data.frame(Length=c(1000,2000,500,1500,3000))
rpkm(d)

cpmByGroup(d, group=c(1,1,2,2))

rpkmByGroup(d, group=c(1,1,2,2))

Example output

Loading required package: limma
          [,1]      [,2]      [,3]     [,4]
[1,] 379310.34 478260.87 121951.22 102564.1
[2,]  34482.76 362318.84 341463.41 128205.1
[3,] 413793.10  86956.52 170731.71 102564.1
[4,] 172413.79  72463.77  24390.24 487179.5
[5,]      0.00      0.00 341463.41 179487.2
    Sample1   Sample2   Sample3   Sample4
1 10989.011 32934.132  4985.045  3984.064
2   999.001 24950.100 13958.126  4980.080
3 11988.012  5988.024  6979.063  3984.064
4  4995.005  4990.020   997.009 18924.303
5     0.000     0.000 13958.126  6972.112
   Sample1  Sample2  Sample3  Sample4
1 13.65870 15.08640 12.76328 12.53996
2 11.54212 14.71199 13.95581 12.76225
3 13.76564 12.95698 13.12580 12.53996
4 12.76534 12.76431 11.54116 14.34680
5 10.95644 10.95644 13.95581 13.12468
     Sample1  Sample2    Sample3   Sample4
1 10989.0110 32934.13  4985.0449  3984.064
2   499.5005 12475.05  6979.0628  2490.040
3 23976.0240 11976.05 13958.1256  7968.127
4  3330.0033  3326.68   664.6726 12616.202
5     0.0000     0.00  4652.7085  2324.037
          1         2
1 21964.180  4484.351
2 12978.174  9467.586
3  8986.985  5480.978
4  4992.511  9963.633
5     0.000 10463.977
          1         2
1 21964.180  4484.351
2  6489.087  4733.793
3 17973.970 10961.956
4  3328.341  6642.422
5     0.000  3487.992

edgeR documentation built on Jan. 16, 2021, 2:03 a.m.