knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The original significance test proposed by Caliskan et al. in their paper is based on exact inference using a permutaion approach. This package provides such an exact test. But like any permutation test (such as Fisher's exact test), the number of permuations, as well as the running time, grows factorially --- O(n!) --- with the number of samples (in the case, the total length of S and T). In the "math vs poetry" example used extensively in this package, the total length of S and T is 16. The total number of iterations needed is 16! = 20922789888000. The number grows 306x to 18! = 6402373705728000 by just adding one item each to S and T. It is unrealistic to expect an ordinary computer can do such computation within a reasonable time frame.

In the following benchmark, we studied how much time does it take to finish an exact test with the word set with a length of 2 to 4. I actually don't recommend trying anything beyond 6; it would take probably a few hours, days even. And actually, the function weat_exact will warn you about that.

library(sweater)
S1 <- c("math", "algebra", "geometry", "calculus", "equations",
 "computation", "numbers", "addition")
T1 <- c("poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture")
A1 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B1 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")

do_exact_weat <- function(n = 4) {
    sw <- weat(glove_math, S1[1:n], T1[1:n], A1[1:n], B1[1:n])
    weat_exact(sw)
}

require(bench)
benchmark_res <- bench::mark(do_exact_weat(2),
                             do_exact_weat(3),
                             do_exact_weat(4),
                             relative = TRUE,
                             check = FALSE)
benchmark_res

Instead, I recommend using the resampling approximation of the exact inference. Actually, in the Java code provided by Caliskan et al., the same resampling approximation is used.

library(sweater)
set.seed(11111)
S1 <- c("math", "algebra", "geometry", "calculus", "equations", "computation", "numbers", "addition")
T1 <- c("poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture")
A1 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B1 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")

sw <- weat(glove_math, S1, T1, A1, B1)
weat_resampling(sw)

The method is extremely accurate. In the following simulation, we repeat the same resampling test for 50 times (which is still faster than conducting one exact test). The p-values are extremely close to the 0.018 repeated by Caliskan et al.

res <- replicate(50, weat_resampling(sw)$p.value)
hist(res)

The precision can be increased by increasing the n_resampling parameter (default to 9999). In the Java code provided by Caliskan et al., this sets to 100000.

res <- replicate(50, weat_resampling(sw, n_resampling = 15999)$p.value)
hist(res)

In the following example, we compare the exact inference with the resampling inference, when n = 5. On a regular computer, the exact test would take 1 minute.

S2 <- c("math", "algebra", "geometry", "calculus", "equations")
T2 <- c("poetry", "art", "dance", "literature", "novel")
A2 <- c("male", "man", "boy", "brother", "he", "him", "his", "son")
B2 <- c("female", "woman", "girl", "sister", "she", "her", "hers", "daughter")

sw3 <- weat(glove_math, T2, S2, A2, B2)

system.time({exact_res <- weat_exact(sw3)})
res <- replicate(100, weat_resampling(sw3)$p.value)
hist(res, main = "Distribution of p-values")
abline(v = exact_res$p.value, col = "red", lwd = 3)


chainsawriot/sweater documentation built on June 29, 2024, 8:17 p.m.