| empEdge | R Documentation |
Higher order inference for one- and two-sample t-tests in high-dimensional data. Includes ordinary and moderated t-statistic and Welch t-test.
empEdge(
dat,
a = NULL,
side = "two-sided",
type = NULL,
unbiased.mom = TRUE,
alpha = 0.05,
ncheck = 30,
lim = c(1, 7)
)
dat |
data matrix with rows corresponding to features. The number of
columns is a sample size and number of rows is a number of tests. If the
number of tests is |
a |
treatment vector; the length has to correspond to the number of
columns in |
side |
the test can be one-sided or two-sided. For a one-sided test, the
values are |
type |
type of the test with possible values |
unbiased.mom |
|
alpha |
significance level. |
ncheck |
number of intervals for tail diagnostic. |
lim |
tail region for tail diagnostic. Provide the endpoints for the right tail (positive values). |
Unadjusted p-values are calculated for five orders of approximation for
ordinary and moderated (empirical Bayes method) t-statistics; prior
information and moderated t-statistics are calculated with limma
package. If prior degrees of freedom is Inf, higher orders are
provided for ordinary t-statistic only. In a two-sample test, when the
variances (and distributions) are not assumed to be equal and Welch t-test is
performed, only results for ordinary t-statistic are provided. Variance
adjustment is used for all the orders (see the paper) and therefore even
first-order results might differ slightly from the regular Student's
t-distribution approximation. When a first-order p-value (for moderated
t-statistic if relevant) is greater than provided significance level
alpha, no higher order inference is calculated.
Tail diagnostic investigating Edgeworth expansion (EE) tail behavior is performed for each relevant feature (row of data); if EE of a particular order is not determined to be helpful, p-value of a previous order is provided in its place.
For better performance of a second order, using unbiased.mom = TRUE is
recommended (default). For variance estimate, posterior variance is used for
moderated t-statistic and unbiased/pooled variance for ordinary t.
A matrix with the same number of rows as dat, each row
providing p-values for five orders of Edgeworth expansions (0 - 4-term
expansions) for a corresponding feature (row of data). Where applicable,
p-values will be provided for both ordinary and moderated t-statistics (10
columns, five orders each); for Welch t-test the matrix will have five
columns, and if prior degrees of freedom is Inf, only first order
p-values are returned for moderated t-statistic (six columns); note that
variance adjustment r^2 is 1 in that case.
tailDiag for tail daignostic, makeFx,
Ftshort, and Ftgen for calculating Edgeworth
expansions of orders 1 to 5, and smpStats for extracting
statistics needed to calculate EE from a sample.
# simulate a data set
nx <- 10 # sample size
m <- 1e4 # number of tests
ns <- 0.05*m # number of significant features
dat <- matrix(rgamma(m*nx, shape = 3) - 3, nrow = m)
shifts <- runif(ns, 1, 5)
dat[1:ns, ] <- dat[1:ns, ] - shifts
# run
res <- empEdge(dat)
head(res, 3)
# one test (data not high-dimensional)
empEdge(dat[1, ], side = "left", unbiased.mom = FALSE, alpha = 0.1)
# Welch test
ny <- 12
dat2 <- cbind(matrix(rnorm(m*ny), nrow = m), dat)
treat <- rep(0:1, c(ny, nx))
res <- empEdge(dat2, treat, type = "Welch", ncheck = 50, lim = c(1, 10))
head(res, 3)
# prior degrees of freedom not finite
if (require(limma)) {
d0 <- 0
while (is.finite(d0)) {
dat <- matrix(rnorm(m*nx), nrow = m)
dat[1:ns, ] <- dat[1:ns, ] + shifts
fit <- lmFit(dat, rep(1, nx))
d0 <- eBayes(fit)$df.prior
}
}
res <- empEdge(dat, side = "right")
head(res, 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.