empEdge: Empirical Edgeworth expansions for high-dimensional data...
In innager/edgee: Edgeworth Expansions and Higher-Order Inference

empEdge

R Documentation

Empirical Edgeworth expansions for high-dimensional data analysis

Description

Higher order inference for one- and two-sample t-tests in high-dimensional data. Includes ordinary and moderated t-statistic and Welch t-test.

Usage

empEdge(
  dat,
  a = NULL,
  side = "two-sided",
  type = NULL,
  unbiased.mom = TRUE,
  alpha = 0.05,
  ncheck = 30,
  lim = c(1, 7)
)

Arguments

`dat`	data matrix with rows corresponding to features. The number of columns is a sample size and number of rows is a number of tests. If the number of tests is `1`, `dat` can be a vector.
`a`	treatment vector; the length has to correspond to the number of columns in `dat`. Treatment code is assumed to have a higher numeric value than control.
`side`	the test can be one-sided or two-sided. For a one-sided test, the values are `"left"` or `"right"`.
`type`	type of the test with possible values `"one-sample"`, `"two-sample"`, and `"Welch"`. For regular one- and two-sample tests the value is inferred from `a` but for Welch t-test it needs to be specified.
`unbiased.mom`	`logical` value indicating if unbiased estimators for third through sixth central moments should be used.
`alpha`	significance level.
`ncheck`	number of intervals for tail diagnostic.
`lim`	tail region for tail diagnostic. Provide the endpoints for the right tail (positive values).

Details

Unadjusted p-values are calculated for five orders of approximation for ordinary and moderated (empirical Bayes method) t-statistics; prior information and moderated t-statistics are calculated with limma package. If prior degrees of freedom is Inf, higher orders are provided for ordinary t-statistic only. In a two-sample test, when the variances (and distributions) are not assumed to be equal and Welch t-test is performed, only results for ordinary t-statistic are provided. Variance adjustment is used for all the orders (see the paper) and therefore even first-order results might differ slightly from the regular Student's t-distribution approximation. When a first-order p-value (for moderated t-statistic if relevant) is greater than provided significance level alpha, no higher order inference is calculated.

Tail diagnostic investigating Edgeworth expansion (EE) tail behavior is performed for each relevant feature (row of data); if EE of a particular order is not determined to be helpful, p-value of a previous order is provided in its place.

For better performance of a second order, using unbiased.mom = TRUE is recommended (default). For variance estimate, posterior variance is used for moderated t-statistic and unbiased/pooled variance for ordinary t.

Value

A matrix with the same number of rows as dat, each row providing p-values for five orders of Edgeworth expansions (0 - 4-term expansions) for a corresponding feature (row of data). Where applicable, p-values will be provided for both ordinary and moderated t-statistics (10 columns, five orders each); for Welch t-test the matrix will have five columns, and if prior degrees of freedom is Inf, only first order p-values are returned for moderated t-statistic (six columns); note that variance adjustment r^2 is 1 in that case.

Examples

# simulate a data set
nx <- 10           # sample size
m  <- 1e4          # number of tests
ns <- 0.05*m       # number of significant features
dat <- matrix(rgamma(m*nx, shape = 3) - 3, nrow = m)
shifts <- runif(ns, 1, 5)
dat[1:ns, ] <- dat[1:ns, ] - shifts
# run
res <- empEdge(dat)
head(res, 3)

# one test (data not high-dimensional)
empEdge(dat[1, ], side = "left", unbiased.mom = FALSE, alpha = 0.1)

# Welch test
ny <- 12
dat2 <- cbind(matrix(rnorm(m*ny), nrow = m), dat)
treat <- rep(0:1, c(ny, nx))
res <- empEdge(dat2, treat, type = "Welch", ncheck = 50, lim = c(1, 10))
head(res, 3)

# prior degrees of freedom not finite
if (require(limma)) {
  d0 <- 0
  while (is.finite(d0)) {
    dat <- matrix(rnorm(m*nx), nrow = m)
    dat[1:ns, ] <- dat[1:ns, ] + shifts
    fit <- lmFit(dat, rep(1, nx)) 
    d0 <- eBayes(fit)$df.prior
  }
} 
res <- empEdge(dat, side = "right")
head(res, 3)

innager/edgee documentation built on April 24, 2024, 8:14 p.m.