In statlab/permuter: permuter: Permutation Tests

Vignette Info

This example is in the textbook section 5.6.2 (p 277-281).

library(permuter)
data(washing_test)
library(dplyr)
library(ggplot2)

Let us now introduce a real case study concerned with the development of a new detergent, where an instrumental performance study modelled by a $C$ independent sample design has been carried out. The R&D division of a chemical company wishes to assess the level of performance of a set of 8 products ($C = 8$) on 25 types of stains, which may be classified into 3 main categories, i.e. general detergency, bleachable and enzymatic. A suitable experiment has been designed. For each of the 8 products, 4 washing machines (therefore 4 replicates/observations in each group) are used to wash a piece of fabric soiled with one of the 25 stains. The experimental response variable is the reflectance, i.e. the percentage of removed stain for the 25 types of stains. Hence it is of interest to compare the 8 products on the basis of their reflectance, using as stratification variable the type of stain. Here we wish to show that it is possible to include a stratification variable while controlling the FWE. Indeed, once the various strata of experimental units have been defined, the observation vector may be resampled independently within strata to simulate the complete null hypothesis of no treatment effect for any stratum. The test statistics are then recomputed by combining over strata, thus obtaining a test statistic for each stratum. Then multiplicity adjustment for multiple tests that have been combined over strata may easily be performed following the standard procedure (see Section 5.4.1). Raw and adjusted p-values are displayed.

C <- length(unique(washing_test$Product))
p <- length(unique(washing_test$Stain))
r <- nrow(washing_test)/(C * p)

B <- 1000
p_raw <- c()
distr <- matrix(, nrow = B, ncol = 0)
for (cc in unique(washing_test$Category)) {
    cat_data <- washing_test %>% dplyr::filter(Category == cc)
    for (ss in unique(cat_data$Stain)) {
        cat_data2 <- cat_data %>% dplyr::filter(Stain == ss)
        n <- table(cat_data2$Product)
        xbar <- rep(NA, C)
        for (gg in 1:C) {
            xbar[gg] <- mean(cat_data2$Reflectance[cat_data2$Product == gg])
        }
        tst <- sum(xbar^2 * n)  # The permutation ANOVA statistic: sum_{j in groups} n_j*(mean(X_j))^2
        distr <- cbind(distr, k_sample(x = cat_data2$Reflectance, group = cat_data2$Product, 
            reps = B, stat = "oneway_anova"))
        p_raw <- c(p_raw, t2p(tst, distr[, ncol(distr)], alternative = "greater"))
    }
}
names(p_raw) <- unique(washing_test$Stain)
colnames(distr) <- unique(washing_test$Stain)


p_adjusted <- fwe_minp(pvalues = p_raw, distr = distr, combine = "tippett")
dom <- c(rep(1, 6), rep(2, 14), rep(3, 5))
res <- data.frame(Category = unique(washing_test$Category)[dom], p = p_raw, adjusted_p = p_adjusted)
res

Plot

washing_test %>% dplyr::filter(Stain == "Blueberry juice") %>% ggplot(aes(x = factor(Product), 
    y = Reflectance)) + geom_boxplot()
summary(aov(washing_test[washing_test$Stain == "Blueberry juice", 4] ~ factor(rep(1:8, 
    each = 4))))

Category p-values

We'll use NPC with Fisher's combining function to get overall p-values for the three categories and a single global p-value.

p_distr <- apply(distr, 2, pvalue_distr)
fisher_observed <- c()
distr_categories <- data.frame(matrix(, nrow = B, ncol = 0))
for (cc in unique(washing_test$Category)) {
    ind <- which(unique(washing_test$Category)[dom] == cc)
    fisher_observed <- c(fisher_observed, fisher(p_raw[ind]))
    distr_categories[, cc] <- apply(p_distr[, ind], 1, fisher)
}

p_categories <- sapply(1:length(fisher_observed), function(i) t2p(fisher_observed[i], 
    p_distr[, i], alternative = "greater"))
names(p_categories) <- unique(washing_test$Category)
p_categories