In pneuvial/sanssouci: Post Hoc Multiple Testing Inference

knitr::opts_chunk$set(echo = TRUE)

This vignettes illustrates the following points

simulation of one- and two-sample Gaussian equi-correlated observations
computation of test statistics by randomization
calibration of Joint-Family-Wise Error Rate (JER) thresholds

We start with two-sample tests because we believe they are used more frequently.

library("sanssouci")
#set.seed(0xBEEF) # for reproducibility
library("ggplot2")

Parameters:

m <- 5e2
n <- 30
pi0 <- 0.8
rho <- 0.3
SNR <- 3

We use the function SansSouciSim to generate Gaussian equi-correlated samples. We then 'fit' the resulting SansSouci object

Two-sample tests

Simulation

obj <- SansSouciSim(m, rho, n, pi0, SNR = SNR, prob = 0.5)
obj

We perform JER calibration using the linear (Simes) template $\mathfrak{R(\lambda)}=(R_1(\lambda),$ $\ldots,R_K(\lambda)$, where for $1 \leq k \leq K$

$$R_k(\lambda) = \left{i\in {1, \dots , m}\::\: \bar{\Phi}(Z_i) > \frac{\lambda k}{m} \right}\,$$

where $\bar{\Phi}$ is the cdf of the $\mathcal{N}(0,1)$ distribution. Note that $p_i = \bar{\Phi}(Z_i)$ is the one-sided $p$-value associated to the test statistics $Z_i$.

Calibration

B <- 1000
alpha <- 0.2
cal <- fit(obj, alpha = alpha, B = B, family = "Simes")
cal

The output of the calibration is the following SansSouci object. In particular:

cal$input contains the input simulated data (and the associated truth)
cal$param contains the calibration parameters
cal$output contains the output of the calibration, including
p.values: $m$ test statistics calculated by permutation
thr : A JER-controlling family of $K$ (here $K=m$) elements
lambda: the $\lambda$-calibration parameter

Because we are under positive equi-correlation, we expect $\lambda > \alpha$ for the Simes family.

cal$output$lambda
cal$output$lambda > alpha

Post hoc confidence bounds

The fitted SansSouci object contains post hoc confidence bounds:

env <- predict(cal, all = TRUE)
head(env)

We compare it to the true number of false positives among the most significant items, and to the Simes bound without calibration

cal0 <- fit(obj, alpha = alpha, B = 0, family = "Simes")
oracle <- fit(obj, alpha = alpha, family = "Oracle")

confs <- list(Simes = predict(cal0, all = TRUE),
              "Simes+calibration" = predict(cal, all = TRUE),
              "Oracle" = predict(oracle, all = TRUE))

plotConfCurve(confs, xmax = 200)

One sample tests

The code is identical, except for the line to generate the observations (where we do not specify a probability of belonging to one of the two populations using the prob argument); moreover it is not necessary to specify a vector of categories 'categ' in 'calibrateJER'.

obj <- SansSouciSim(m, rho, n, pi0, SNR = SNR)
obj

cal <- fit(obj, alpha = alpha, B = B, family = "Simes")
cal

Again we expect $\lambda > \alpha$.

Confidence curves

The associated confidence curves are displayed below:

cal0 <- fit(obj, alpha = alpha, B = 0, family = "Simes")
oracle <- fit(obj, alpha = alpha, family = "Oracle")

confs <- list(Simes = predict(cal0, all = TRUE),
              "Simes+calibration" = predict(cal, all = TRUE),
              "Oracle" = predict(oracle, all = TRUE))