compute_scale_factors: Compute count scaling factors

Description Usage Arguments Value See Also Examples

View source: R/transform_counts.R

Description

This function computes the count scaling factors used by transform_counts(). This function is similar to recount::scale_counts(factor_only = TRUE), but it is more general.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
compute_scale_factors(
  x,
  by = c("auc", "mapped_reads"),
  targetSize = 4e+07,
  L = 100,
  auc = "recount_qc.bc_auc.all_reads_all_bases",
  avg_mapped_read_length = "recount_qc.star.average_mapped_length",
  mapped_reads = "recount_qc.star.all_mapped_reads",
  paired_end = is_paired_end(x, avg_mapped_read_length)
)

Arguments

x

Either a RangedSummarizedExperiment-class created by create_rse() or the sample metadata created by read_metadata().

by

Either auc or mapped_reads. If set to auc it will compute the scaling factor by the total coverage of the sample. That is, the area under the curve (AUC) of the coverage. If set to mapped_reads it will scale the counts by the number of mapped reads (in the QC annotation), whether the library was paired-end or not, and the desired read length (L).

targetSize

A numeric(1) specifying the target library size in number of single end reads.

L

A integer(1) specifying the target read length. It is only used when by = 'mapped_reads' since it cancels out in the calculation when using by = 'auc'.

auc

A character(1) specifying the metadata column name that contains the area under the coverage (AUC). Note that there are several possible AUC columns provided in the sample metadata generated by create_rse().

avg_mapped_read_length

A character(1) specifying the metdata column name that contains the average fragment length after aligning. This is typically twice the average read length for paired-end reads.

mapped_reads

A character(1) specifying the metadata column name that contains the number of mapped reads.

paired_end

A logical() vector specifying whether each sample is paired-end or not.

Value

A numeric() with the sample scale factors that are used by transform_counts().

See Also

Other count transformation functions: compute_read_counts(), is_paired_end(), transform_counts()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
## Download the metadata for SRP009615, a single-end study
SRP009615_meta <- read_metadata(
    metadata_files = file_retrieve(
        locate_url(
            "SRP009615",
            "data_sources/sra",
        )
    )
)

## Compute the scaling factors
compute_scale_factors(SRP009615_meta, by = "auc")
compute_scale_factors(SRP009615_meta, by = "mapped_reads")

## Download the metadata for DRP000499, a paired-end study
DRP000499_meta <- read_metadata(
    metadata_files = file_retrieve(
        locate_url(
            "DRP000499",
            "data_sources/sra",
        )
    )
)

## Compute the scaling factors
compute_scale_factors(DRP000499_meta, by = "auc")
compute_scale_factors(DRP000499_meta, by = "mapped_reads")

## You can compare the factors against those from recount::scale_counts()
## from the recount2 project which used a different RNA-seq aligner
## If needed, install recount, the R/Bioconductor package for recount2:
# BiocManager::install("recount")
recount2_factors <- recount::scale_counts(
    recount::rse_gene_SRP009615,
    by = "auc", factor_only = TRUE
)
recount3_factors <- compute_scale_factors(SRP009615_meta, by = "auc")
recount_factors <- data.frame(
    recount2 = recount2_factors[order(names(recount2_factors))],
    recount3 = recount3_factors[order(names(recount3_factors))]
)
plot(recount2 ~ recount3, data = recount_factors)
abline(a = 0, b = 1, col = "purple", lwd = 2, lty = 2)

recount3 documentation built on Feb. 13, 2021, 2 a.m.