convertCounts: Convert count matrix to CPM, FPKM, FPK, or TPM

Description Usage Arguments Details Value Examples

View source: R/convertCounts.R

Description

Takes a count matrix as input and converts to other desired units. Supported units include CPM, FPKM, FPK, and TPM. Output units can be logged and/or normalized. Calculations are performed using edgeR functions except for the conversion to TPM which is converted from FPKM using the formula provided by Harold Pimental.

Usage

1
2
3
4
5
6
7
8
convertCounts(
  countsMatrix,
  unit,
  geneLength,
  log = FALSE,
  normalize = "none",
  prior.count = NULL
)

Arguments

countsMatrix

A numeric matrix or dataframe of N genes x M Samples. All columns must be numeric.

unit

Required. One of CPM, FPKM, FPK or TPM.

geneLength

A vector or matrix of gene lengths. Required for length-normalized units (TPM, FPKM or FPK). If geneLength is a matrix, the rowMeans are calculated and used.

log

Default = FALSE. Set TRUE to return Log2 values. Employs edgeR functions which use an prior.count of 0.25 scaled by the library size.

normalize

Default = "none". Invokes edgeR::calcNormFactors() for normalization. Other options are: "TMM", "RLE", "upperquartile" (uses 75th percentile), "TMMwzp" and are case-insensitive.

prior.count

Average count to be added to each observation to avoid taking log of zero. Used only if log = TRUE. (Default dependent on method; 0 for TPM, 0.25 for CPM and FPKM) The prior.count is passed to edgeR cpm and rpkm functions and applies to logTPM, logCPM, and logFPKM calculations.

Details

geneLength is a vector where length(geneLength) == nrow(countsMatrix). If a RSEM effectiveLength matrix is passed as input, rowMeans(effectiveLength) is used (because edgeR functions only accept a vector for effectiveLength).

Note that log2 values for CPM, TPM, and FPKM employ edgeR's prior.count handling to avoid divide by zero.

Value

A matrix in the new unit space

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Simulate some data
counts <- trunc(matrix(runif(6000, min=0, max=2000), ncol=6))
geneLength <- rowMeans(counts)

# TMM normalized Log2FPKM
Log2FPKM <- convertCounts(counts,
                          unit = "fpkm",
                          geneLength = geneLength,
                          log = TRUE,
                          normalize = "tmm")

# Non-normalized CPM (not logged)
RawCPM <- convertCounts(counts,
                        unit = "CPM",
                        log = FALSE,
                        normalize = "none")

DGEobj.utils documentation built on April 28, 2021, 9:06 a.m.