smpStats: Sample statistics for Edgeworth expansions
In innager/edgee: Edgeworth Expansions and Higher-Order Inference

smpStats

R Documentation

Sample statistics for Edgeworth expansions

Description

Calculate sample statistics needed for Edgeworth expansions.

Usage

smpStats(
  smp,
  a = NULL,
  type = NULL,
  unbiased.mom = TRUE,
  moder = FALSE,
  d0 = NULL,
  s20 = NULL,
  varpost = NULL
)

Arguments

`smp`	sample.
`a`	vector of the same length as `smp` specifying categories of observations (should contain two unique values). Treatment code is assumed to have a higher numeric value than control (relevant for `type = "Welch"`).
`type`	type of the test with possible values `"one-sample"`, `"two-sample"`, and `"Welch"`. For regular one- and two-sample tests the value is inferred from `a` but for Welch t-test it needs to be specified.
`unbiased.mom`	`logical` value indicating if unbiased estimators for third through sixth central moments should be used.
`moder`	`logical` value indicating if Edgeworth expansions for a moderated t-statistic will be used. If `TRUE`, prior information (`d0` and `s20`) and posterior variance should be provided.
`d0`	prior degrees of freedom (needed if `moder = TRUE`).
`s20`	prior value for variance (needed if `moder = TRUE`).
`varpost`	posterior variance (needed if `moder = TRUE`).

Value

A named vector of sample statistics to be used in Edgeworth exansions. The calculated statistics and corresponding names are:

for ordinary one-sample t-statistic: scaled cumulants named "lam3", "lam4", "lam5", "lam6";
for moderated one-sample t-statistic: central moment estimates named "mu2", "mu3", "mu4", "mu5", "mu6", A, B, and prior degrees of freedom named "d0";
for ordinary two-sample t-statistic: central moment estimates repeated twice since the same distribution is assumed for two groups, named "mu_x2", "mu_x3", "mu_x4", "mu_x5", "mu_x6" and "mu_y2", "mu_y3", "mu_y4", "mu_y5", "mu_y6", A, B_x, B_y, b_x, and b_y;
for moderated two-sample t-statistic: estimates of the same quantities as for ordinary t (with different estimators); additionally, prior degrees of freedom named "d0" is included;
for Welch t-test: estimates of the same quantities as for ordinary t-statistic (with different estimators). In this case, central moment estimates for treatment and control groups are different.

Examples

# simulate sample - one-sample test
n <- 10
smp <- rlnorm(n, sdlog = 0.6)  
stats <- smpStats(smp)
stats
t <- sqrt(n)*mean(smp)/sd(smp)
tailDiag(stats, n)
Ft <- makeFx(stats, n, base = "t")
Ft(t)

# two-sample test
n2 <- 8
smp2 <- c(smp, rnorm(n2))
a <- rep(0:1, c(n, n2))
smpStats(smp2, a, unbiased.mom = FALSE)

# moderated t-statistic
if (requireNamespace("limma")) {
  # simulate high-dimensional data
  m  <- 1e4          # number of tests
  ns <- 0.05*m       # number of significant features
  dat <- matrix(rgamma(m*n, shape = 3) - 3, nrow = m)
  shifts <- runif(ns, 1, 5)
  dat[1:ns, ] <- dat[1:ns, ] - shifts
  # estimate prior information
  fit <- limma::lmFit(dat, rep(1, n))
  fbay <- limma::eBayes(fit)
  # look at one feature (row of data)
  i <- 625
  smpStats(dat[i, ], moder = TRUE, d0 = fbay$df.prior, s20 = fbay$s2.prior, 
           varpost = fbay$s2.post[i])
}

innager/edgee documentation built on April 24, 2024, 8:14 p.m.