getTPM: Compute a TPM matrix based on a RangedSummarizedExperiment...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/getTPM.R


For some analyses you might be interested in transforming the counts into TPMs which you can do with this function. This function uses the gene-level RPKMs to derive TPM values (see Details).


getTPM(rse, length_var = "bp_length", mapped_var = NULL)



A RangedSummarizedExperiment-class object as downloaded with download_study.


A length 1 character vector with the column name from rowData(rse) that has the coding length. For gene level objects from recount this is bp_length. If NULL, then it will use width(rowRanges(rse)) which should be used for exon RSEs.


A length 1 character vector with the column name from colData(rse) that has the number of reads mapped. For recount RSE object this would be mapped_read_count. If NULL (default) then it will use the column sums of the counts matrix. The results are different because not all mapped reads are mapped to exonic segments of the genome.


For gene RSE objects, you will want to specify the length_var because otherwise you will be adjusting for the total gene length instead of the total exonic sequence length of the gene.

As noted in, Sonali Arora et al computed TPMs in using the formula: TPM = FPKM / (sum of FPKM over all genes/transcripts) * 10^6

Arora et al mention in their code that the formula comes from; specifically 1.1.1 Comparison to RPKM estimation where they mention an important assumption: Under the assumption of uniformly distributed reads, we note that RPKM measures are estimates of ...

There's also a blog post by Harold Pimentel explaining the relationship between FPKM and TPM:


A matrix with the TPM values.


Sonali Arora, Leonardo Collado-Torres


See Also



## Compute the TPM matrix from the raw gene-level base-pair counts.
tpm <- getTPM(rse_gene_SRP009615)

recount documentation built on Dec. 20, 2020, 2:01 a.m.