normalize_counts: Normalize TSS counts

Description Usage Arguments Details Value Examples

View source: R/normalize.R

Description

edgeR, DESeq2, or CPM normalization of TSS counts

Usage

1
2
3
4
5
6
7
normalize_counts(
  experiment,
  data_type = c("tss", "tss_features"),
  method = "DESeq2",
  threshold = 1,
  n_samples = 1
)

Arguments

experiment

TSRexploreR object.

data_type

Whether TSS ('tss') or gene/transcript counts (in development) should be normalized.

method

Either 'edgeR', 'DESeq2', or 'CPM'.

threshold

TSSs or TSRs with a score below this value will not be considered.

n_samples

Filter out TSSs or features not meeting the the selected threshold in this number of samples.

Details

This function performs one of three normalizations on TSS or gene/transcript counts. The simplest of these is counts per million (CPM), which accounts for sequencing depth. While CPM is appropriate for comparing replicates, it is considered to be too simple for cases in which there are expected to be substantial differences in RNA composition between samples. For between-sample comparisons, the trimmed median of M-values (TMM) or median-of-ratios (MOR) approaches, implemented in edgeR and DESeq2, respectively, can be used. Both of these methods are designed to reduce the impact of library size on such comparisons. Prior to TMM or MOR normalization, it is recommend to remove features with few or no reads, as they may bias the final results. To facilitate this filtering, two arguments are provided: 'threshold' and 'n_samples'. Features must have greater than or equal to 'threshold' number of raw counts in at least 'n_samples' number of samples to proceed through normalization.

When clustering TSSs into TSRs using 'tss_clustering', both the raw and normalized counts will be stored in the new TSRs.

Value

TSRexploreR object with normalized counts.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data(TSSs)
sample_sheet <- data.frame(
  sample_name=c(
    sprintf("S288C_D_%s", seq_len(3)),
    sprintf("S288C_WT_%s", seq_len(3))
  ),
  file_1=rep(NA, 6), file_2=rep(NA, 6),
  condition=c(
    rep("Diamide", 3),
    rep("Untreated", 3)
  )
)

exp <- TSSs %>%
  tsr_explorer(sample_sheet=sample_sheet) %>%
  format_counts(data_type="tss")

exp <- normalize_counts(exp, method="CPM")

rpolicastro/tsrexplorer documentation built on Oct. 17, 2021, 3:02 p.m.