rbtt is an alternative bootstrap-based t-test aiming to reduce type-I error for non-negative, zero-inflated data
Tu & Zhou (1999) showed that comparing the means of populations whose data-generating distributions are non-negative with excess zero observations is a problem of great importance in the analysis of medical cost data. In the same study, Tu & Zhou discuss that it can be difficult to control type-I error rates of general-purpose statistical tests for comparing the means of these particular data sets. This package allows users to perform a modified bootstrap-based t-test that aims to better control type-I error rates in these situations.
Let's say we have some non-negative data with clumping at zero:
x <- rbinom(50, 1, 0.5) * rlnorm(50, 0, 1)
y <- rbinom(150, 1, 0.3) * rlnorm(150, 2, 1)
Then we may compute rbtt-based t-tests to compare the means:
# Use ‘method = 1’ for a two-sample, two-sided rbtt under the equal variance assumption,
rbtt(x, y, n.boot=999, method = 1)
##
## Two-sided robust bootstrapped t-test assuming equal variance
##
## data: x and y
## t = -2.3, p-value = 0.03
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.2644 -0.1393
## sample estimates:
## mean of x mean of y
## 0.8163 5.1450
# Use ’method = 2' for a two-sample, one-sided rbtt without the equal variance assumption
rbtt(x, y, n.boot=999, method = 2)
##
## One-sided robust bootstrapped t-test not assuming equal variance
##
## data: x and y
## t = -4, p-value <2e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -6.932 -1.726
## sample estimates:
## mean of x mean of y
## 0.8163 5.1450
Alternatively, you can specify method = "both"
to perform both methods
simultaneously (this is also done by default).
# Compare speed when using single-core versus multiple-core rbtt on 99999 bootstrap resamples
system.time(rbtt(x, y, n.boot = 99999, method = 1, n.cores = 1))
## user system elapsed
## 8.150 0.011 8.198
system.time(rbtt(x, y, n.boot = 99999, method = 1, n.cores = 3))
## user system elapsed
## 6.617 0.078 3.782
First, we perform some simulations.
n.sim <- 999
t.test.results <- numeric(n.sim)
rbtt.results <- numeric(n.sim)
pval.table.list <- mclapply(1:n.sim, function(i)
{
# True means are equal
x <- rbinom(50, 1, 0.5) * rlnorm(50, 1.15, 1)
y <- rbinom(150, 1, 0.5) * rlnorm(150, 1.15, 1)
t.test.result <- t.test(x, y)$p.value
rbtt.result <- rbtt(x, y, n.boot = 999, method = 1)$p.value
return(c(t.test.result, rbtt.result))
}, mc.cores = 4)
pval.table <- do.call(rbind, pval.table.list)
Now, let's evaluate the type-I error of these simulations using a significance level of 0.05.
# t.test type-I error with significance level of 0.05:
sum(pval.table[,1] < 0.05) / n.sim
## [1] 0.06006
# rbtt type-I error with significance level of 0.05:
sum(pval.table[,2] < 0.05) / n.sim
## [1] 0.05005
More accurate p-values and type-I error estimates can be obtained by
increasing n.boot
and n.sim
, respectively
Ian Waudby-Smith (University of Waterloo)
Dr. Pengfei Li (University of Waterloo)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.