rrep: Calculate rate ratios with standard error of count data based...
In tgac-vumc/CSSA: Analysis and Simulation Tools for CRISPR-Cas9 Pooled Screens

View source: R/rrep_degrep.R

rrep	R Documentation

Calculate rate ratios with standard error of count data based on replicates

Description

rrep calculates rate ratios and corresponding standard errors of features based on the presence of replicates. The function calculates standard error of the rate ratio based on replicates, based on variance associated with read depth (i.e. counts), or both. Rate ratio and standard error are expressed as log2-values.

Usage

rrep(
  t1,
  t0,
  paired = TRUE,
  normfun = "sum",
  normsubset,
  rstat = "summed",
  variance = "combined",
  countvar = "poisson"
)

Arguments

`t1`	Matrix or data frame, with rows representing features and columns representing replicates of measurements at t1 (or treated).
`t0`	Matrix or data frame, with rows representing features and columns representing replicates of measurements at t0 (or untreated).
`paired`	Logical. Are measurements paired? Default = TRUE
`normfun`	Character string. Specify with which function to standardize the data. Default = "sum"
`normsubset`	Integer vector. Specify the indices of features that are to be used in standardization
`rstat`	Character string. Specify whether rate ratios are calculated over the sum of the counts in replicates or as the "median" or "mean" of the log2 rate ratios. Default = "summed"
`variance`	Character string. Specify how to calculate variance: using only `count` variance, only `replicate` variance, or the both `combined`. Default = "combined"
`countvar`	Character string. Specify how to calculate variance based on count depth. Either "poisson", "qp" (quasipoisson), or "nb" (negative binomial). Default = "poisson"

Details

This function combines the confidence based on replicate measurements with the confidence based on counts (e.g. read depth). Utilizing replicates to assess confidence in point estimates of individual features is commonplace in many analyses. However, in data sets with many features just by chance some features will have measurements that lie very close together. By adding the variance based on the count data, spurious findings are greatly reduced, especially when counts are low. Variance of count data on a log2-transformed scale is approximated with the formula 1/(log(2)^2*count) if counts are expected to follow Poisson distributions. In case of quasipoisson and negative binomial, available through the countvar option, the formula is theta/(log(2)^2*count) and (1+theta*count)/(log(2)^2*count) respectively, with theta being an overdispersion factor fitted on the non-transformed data of the experimental arms (i.e. t0 and t1).

Value

Returns a data frame with the log2-transformed rate ratio and corresponding standard error of each feature.

Note

Counts are checked for zeros per feature (row). In case of any zeros, a pseudocount of 1/replicates is added to all counts in that row. These pseudocounts are not included in the normalization on the bases of total counts (of all features are the normalization subset) in an experimental arm.

Author(s)

Jos B. Poell

Examples

  set.seed(1000)
  c0 <- rbind(rpois(3, 60), rpois(3, 5), rpois(3, 10000000))
  c1 <- rbind(rpois(3, 20), rpois(3, 20), rpois(3, 10000000))
  rrep(c1, c0, paired = FALSE)
  c01 <- rbind(rep(60, 3), rep(5, 3), rep(10000000, 3))
  c11 <- rbind(rep(20, 3), rep(20, 3), rep(10000000,3))
  rrep(c11, c01)
  c04 <- 4*c01
  c14 <- 4*c11
  rrep(c14,c04)

tgac-vumc/CSSA documentation built on Oct. 10, 2022, 7:27 p.m.