rrep: Calculate rate ratios with standard error of count data based...

View source: R/rrep_degrep.R

rrepR Documentation

Calculate rate ratios with standard error of count data based on replicates

Description

rrep calculates rate ratios and corresponding standard errors of features based on the presence of replicates. The function calculates standard error of the rate ratio based on replicates, based on variance associated with read depth (i.e. counts), or both. Rate ratio and standard error are expressed as log2-values.

Usage

rrep(
  t1,
  t0,
  paired = TRUE,
  normfun = "sum",
  normsubset,
  rstat = "summed",
  variance = "combined",
  countvar = "poisson"
)

Arguments

t1

Matrix or data frame, with rows representing features and columns representing replicates of measurements at t1 (or treated).

t0

Matrix or data frame, with rows representing features and columns representing replicates of measurements at t0 (or untreated).

paired

Logical. Are measurements paired? Default = TRUE

normfun

Character string. Specify with which function to standardize the data. Default = "sum"

normsubset

Integer vector. Specify the indices of features that are to be used in standardization

rstat

Character string. Specify whether rate ratios are calculated over the sum of the counts in replicates or as the "median" or "mean" of the log2 rate ratios. Default = "summed"

variance

Character string. Specify how to calculate variance: using only count variance, only replicate variance, or the both combined. Default = "combined"

countvar

Character string. Specify how to calculate variance based on count depth. Either "poisson", "qp" (quasipoisson), or "nb" (negative binomial). Default = "poisson"

Details

This function combines the confidence based on replicate measurements with the confidence based on counts (e.g. read depth). Utilizing replicates to assess confidence in point estimates of individual features is commonplace in many analyses. However, in data sets with many features just by chance some features will have measurements that lie very close together. By adding the variance based on the count data, spurious findings are greatly reduced, especially when counts are low. Variance of count data on a log2-transformed scale is approximated with the formula 1/(log(2)^2*count) if counts are expected to follow Poisson distributions. In case of quasipoisson and negative binomial, available through the countvar option, the formula is theta/(log(2)^2*count) and (1+theta*count)/(log(2)^2*count) respectively, with theta being an overdispersion factor fitted on the non-transformed data of the experimental arms (i.e. t0 and t1).

Value

Returns a data frame with the log2-transformed rate ratio and corresponding standard error of each feature.

Note

Counts are checked for zeros per feature (row). In case of any zeros, a pseudocount of 1/replicates is added to all counts in that row. These pseudocounts are not included in the normalization on the bases of total counts (of all features are the normalization subset) in an experimental arm.

Author(s)

Jos B. Poell

See Also

degrep, CRISPRsim, ess, noness

Examples

  set.seed(1000)
  c0 <- rbind(rpois(3, 60), rpois(3, 5), rpois(3, 10000000))
  c1 <- rbind(rpois(3, 20), rpois(3, 20), rpois(3, 10000000))
  rrep(c1, c0, paired = FALSE)
  c01 <- rbind(rep(60, 3), rep(5, 3), rep(10000000, 3))
  c11 <- rbind(rep(20, 3), rep(20, 3), rep(10000000,3))
  rrep(c11, c01)
  c04 <- 4*c01
  c14 <- 4*c11
  rrep(c14,c04)


tgac-vumc/CSSA documentation built on Oct. 10, 2022, 7:27 p.m.