estimateD0Robust: Estimate Number of Prior Degrees of Freedom in a Robust...
In MAnorm2: Tools for Normalizing and Comparing ChIP-seq Samples

estimateD0Robust

R Documentation

Estimate Number of Prior Degrees of Freedom in a Robust Manner

Description

estimateD0Robust underlies other interface functions for estimating the number of prior degrees of freedom associated with an unadjusted mean-variance curve (or a set of unadjusted mean-variance curves) in a robust manner.

Usage

estimateD0Robust(
  z,
  m,
  p_low = 0.01,
  p_up = 0.1,
  d0_low = 0.001,
  d0_up = 1e+06,
  eps = d0_low,
  nw = gauss.quad(128, kind = "legendre")
)

Arguments

`z`	A list of which each element is a vector of FZ statistics corresponding to a `bioCond` object (see also "Details").
`m`	A vector of numbers of replicates in `bioCond` objects. Must correspond to `z` one by one in the same order.
`p_low, p_up`	Lower- and upper-tail probabilities for Winsorizing the FZ statistics associated with each `bioCond`.
`d0_low, d0_up`	Positive reals specifying the lower and upper bounds of estimated d0 (i.e., number of prior degrees of freedom). `Inf` is not allowed. During the estimation process, if d0 is sure to be less than or equal to `d0_low`, it will be considered as 0, and if it is sure to be larger than or equal to `d0_up`, it will be considered as positive infinity.
`eps`	The required numeric precision for estimating d0.
`nw`	A list containing `nodes` and `weights` variables for calculating the definite integral of a function `f` over the interval `[-1, 1]`, which is approximated by `sum(nw$weights * f(nw$nodes))`. By default, a set of Gauss-Legendre nodes along with the corresponding weights calculated by `gauss.quad` is used.

Details

For each bioCond object with replicate samples, a vector of FZ statistics can be deduced from the unadjusted mean-variance curve associated with it. More specifically, for each genomic interval in a bioCond with replicate samples, its FZ statistic is defined to be log(t_hat / v0), where t_hat is the observed variance of signal intensities of the interval, and v0 is the interval's prior variance read from the corresponding mean-variance curve.

Theoretically, each FZ statistic follows a scaled Fisher's Z distribution plus a constant (since the mean-variance curve is not adjusted yet), and we derive a robust estimation of d0 (i.e., number of prior degrees of freedom) by Winsorizing the FZ statistics of each bioCond and matching the resulting sample variance with the theoretical variance of the Winsorized distribution, which is calculated by using numerical integration (see also "References"). Since the theoretical variance has no compact forms regarding d0, the matching procedure is achieved by using the method of bisection.

Inspired by the ordinary (non-robust) routine for estimating d0, we derive the final estimate of d0 by separately applying the function trigamma(x / 2) to the estimated d0 from each bioCond, taking a weighted average across the results, and applying the inverse of the function (achieved by using Newton iteration; see also trigamma). Here the weights are the numbers of genomic intervals (in the bioConds) minus 1 that are used to calculate FZ statistics.

Value

The estimated number of prior degrees of freedom. Note that the function returns NA if there are not sufficient genomic intervals for estimating it.

References

Phipson, B., et al., Robust Hyperparameter Estimation Protects against Hypervariable Genes and Improves Power to Detect Differential Expression. Annals of Applied Statistics, 2016. 10(2): p. 946-963.

Examples

## Not run: 
## Private functions involved.

# For generating random FZ statistics with outliers. Note that the argument
# scaling controls how extreme outliers are.
rFZ <- function(n, var.ratio, m, d0, p_low, p_up, scaling) {
    z <- list()
    p_low <- p_low * 0.9
    p_up <- p_up * 0.9
    for (i in 1:length(n)) {
        x <- rf(n[i], m[i] - 1, d0)
        q_low <- qf(p_low, m[i] - 1, d0, lower.tail = TRUE)
        q_up <- qf(p_up, m[i] - 1, d0, lower.tail = FALSE)
        f <- x < q_low
        x[f] <- x[f] / runif(sum(f), 1, scaling)
        f <- x > q_up
        x[f] <- x[f] * runif(sum(f), 1, scaling)
        z[[i]] <- log(var.ratio[i]) + log(x)
    }
    z
}

# Settings.
n <- c(30000, 40000)
var.ratio <- c(1.2, 2.5)
m <- c(2, 3)
d0 <- 17
p_low <- 0.01
p_up <- 0.1

# Compare estimation results from ordinary (non-robust) and robust routines.
# Case 1: no outliers.
set.seed(100)
scaling <- 1
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)

# Case 2: moderate outliers.
scaling <- 3
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)

# Case 3: extreme outliers.
scaling <- 10
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)

## End(Not run)

MAnorm2 documentation built on Oct. 29, 2022, 1:12 a.m.