transform_counts: Transform the raw counts provided by the recount3 project

View source: R/transform_counts.R

transform_countsR Documentation

Transform the raw counts provided by the recount3 project

Description

In preparation for a differential expression analysis, you will have to choose how to scale the raw counts provided by the recount3 project. These raw counts are similar to those provided by the recount2 project, except that they were generated with a different aligner and a modified counting approach. The raw coverage counts for recount2 are described with illustrative figures at https://doi.org/10.12688/f1000research.12223.1. Note that the raw counts are the sum of the base level coverage so you have to take into account the total base-pair coverage for the given sample (default option) by using the area under the coverage (AUC), or alternatively use the mapped read lengths. You might want to do some further scaling to take into account the gene or exon lengths. If you prefer to calculate read counts without scaling check the function compute_read_counts().

Usage

transform_counts(
  rse,
  by = c("auc", "mapped_reads"),
  targetSize = 4e+07,
  L = 100,
  round = TRUE,
  ...
)

Arguments

rse

A RangedSummarizedExperiment-class created by create_rse().

by

Either auc or mapped_reads. If set to auc it will compute the scaling factor by the total coverage of the sample. That is, the area under the curve (AUC) of the coverage. If set to mapped_reads it will scale the counts by the number of mapped reads (in the QC annotation), whether the library was paired-end or not, and the desired read length (L).

targetSize

A numeric(1) specifying the target library size in number of single end reads.

L

A integer(1) specifying the target read length. It is only used when by = 'mapped_reads' since it cancels out in the calculation when using by = 'auc'.

round

A logical(1) specifying whether to round the transformed counts or not.

...

Further arguments passed to compute_scale_factors().

Details

This function is similar to recount::scale_counts() but more general and with a different name to avoid NAMESPACE conflicts.

Value

A matrix() with the transformed (scaled) counts.

See Also

Other count transformation functions: compute_read_counts(), compute_scale_factors(), is_paired_end()

Examples


## Create a RSE object at the gene level
rse_gene_SRP009615 <- create_rse_manual("SRP009615")

## Scale the counts using the AUC
assays(rse_gene_SRP009615)$counts <- transform_counts(rse_gene_SRP009615)

## See that now we have two assayNames()
rse_gene_SRP009615
assayNames(rse_gene_SRP009615)

## You can compare the scaled counts against those from
## recount::scale_counts() from the recount2 project
## which used a different RNA-seq aligner
## If needed, install recount, the R/Bioconductor package for recount2:
# BiocManager::install("recount")
recount2_sizes <- colSums(assay(recount::scale_counts(
    recount::rse_gene_SRP009615,
    by = "auc"
), "counts")) / 1e6
recount3_sizes <- colSums(assay(rse_gene_SRP009615, "counts")) / 1e6
recount_sizes <- data.frame(
    recount2 = recount2_sizes[order(names(recount2_sizes))],
    recount3 = recount3_sizes[order(names(recount3_sizes))]
)
plot(recount2 ~ recount3, data = recount_sizes)
abline(a = 0, b = 1, col = "purple", lwd = 2, lty = 2)

## Compute RPKMs
assays(rse_gene_SRP009615)$RPKM <- recount::getRPKM(rse_gene_SRP009615)
colSums(assay(rse_gene_SRP009615, "RPKM"))

## Compute TPMs
assays(rse_gene_SRP009615)$TPM <- recount::getTPM(rse_gene_SRP009615)
colSums(assay(rse_gene_SRP009615, "TPM")) / 1e6 ## Should all be equal to 1

LieberInstitute/recount3 documentation built on May 4, 2024, 4:16 a.m.