In statlab/permuter: permuter: Permutation Tests

Vignette Info

This example comes from Chapter 6, page 305-307.

We'll consider an application concerning a case/control study in which the dependent variable is the haplotypical expression in a chromosome 17 marker. The goal of the study is to estimating the association between the group variable (Cases/Controls) and the marker (the categorical variable).

library(permuter)
library(knitr)
set.seed(50)
data(chrom17m)
chrom17m

The association test was performed using $\chi^2$ (asymptotic and exact) and $T_{SM}$ tests (Sham and Curtis, 1995). Moreover, the analysis was developed following the multifocus approach. Ten thousand CMC iterations were carried out for the four tests based on a permutation strategy.

This explanation makes no sense. what 4 tests?

What is the null/alternative hypothesis?

The permutation proceeds as follows: the number of cases and controls (n1 and n2) remains fixed. We permute the group labels and then recompute the $k$-by-2 table. The test statistic is something for each of the $k$ categories.

B <- 1000

X1 <- rep(chrom17m$Groups, chrom17m$Cases)
X2 <- rep(chrom17m$Groups, chrom17m$Controls)
X <- as.factor(c(X1, X2))
k <- nrow(chrom17m)
n1 <- sum(chrom17m$Cases)
n2 <- sum(chrom17m$Controls)
n <- n1 + n2
l <- rep(c(1, 2), c(n1, n2))
data <- chrom17m[, -1]
marginal <- rowSums(chrom17m[, -1])

observed <- rep(0, k)
distr <- matrix(0, nrow = B, ncol = k)
for (kk in 1:k) {
    observed[kk] <- (chrom17m[kk, 2] - marginal[kk] * n1/n)^2/(marginal[kk] * n1/n)
    for (bb in 1:B) {
        X.star <- sample(X)
        data.star <- table(X.star, l)
        marginal.star <- rowSums(data.star)
        distr[bb, kk] <- (data.star[kk, 1] - marginal.star[kk] * n1/n)^2/(marginal.star[kk] * 
            n1/n)
    }
}

We compute the p-values for each of the $k$ groups individually. Then, since we want to make inferences for individual alleles, we need to adjust for multiple tests. We use fwe.minp to do this.

partial_pvalues <- sapply(1:k, function(i) t2p(observed[i], distr[, i], alternative = "greater"))
names(partial_pvalues) <- chrom17m$Groups

adjusted_pvalues <- fwe_minp(partial_pvalues, distr)
names(adjusted_pvalues) <- chrom17m$Groups

kable(cbind(partial_pvalues, adjusted_pvalues), row.names = TRUE, col.names = c("Partial P-values", 
    "Adjusted P-values"))

The table above shows the raw p-values and the adjusted p-values for multiplicity using Tippett's combination for each category. The alleles A and C are individually significant at a level of $\alpha = 0.05$. After adjustment for multiplicity (Tippett's step-down procedure; Finos et al., 2003) only allele C remains significant.

Globals

Finally, suppose we want a global p-value for the hypothesis test. We'll use the NPC methodology.

null_pvalues <- apply(distr, 2, pvalue_distr)  # get a p-value corresponding to each null test statistic
tippett_distr <- apply(null_pvalues, 1, tippett)
fisher_distr <- apply(null_pvalues, 1, fisher)
liptak_distr <- apply(null_pvalues, 1, liptak)

global_pvalue_tippett <- t2p(tippett(partial_pvalues), tippett_distr, alternative = "greater")
global_pvalue_fisher <- t2p(fisher(partial_pvalues), fisher_distr, alternative = "greater")
global_pvalue_liptak <- t2p(liptak(partial_pvalues), liptak_distr, alternative = "greater")

globals <- c(global_pvalue_tippett, global_pvalue_fisher, global_pvalue_liptak)
kable(t(globals), row.names = F, col.names = c("Tippett", "Fisher", "Liptak"))