README.md
In skewsamp: Estimate Sample Sizes for Group Comparisons with Skewed Distributions

skewsamp

The goal of skewsamp is to provide access to sample size estimation methods for group comparisons where the underlying data are skewed and thus violate the assumptions for common methods of sample size estimation.

In particular, skewsamp offers an approach based on generalized linear models (GLM) as described by Cundill & Alexander (2015) and the “NECDF” (Noether Empirical Distribution Function) approach based on the nonparametric Wilxocon-Mann-Whitney test in the location shift paradigm as described by Chakraborti, Hong, & van de Wiel (2006).

You can install the package directly from github:

# install.packages("devtools") # if you do not have devtools already installed, you need it for the installation
devtools::install_github("https://github.com/jobrachem/skewsamp)

All function are documented, so that you can use R’s builtin help system. You can also refer to the online documentation, which includes a list of all functions.

We verified the correctness of our implementation through extensive simulations. The data, code and final report are available on the Open Science Framework. Note that the report is written in german.

https://osf.io/z5vtf/ (Project)
https://osf.io/yb5xm/ (Report)

The simulations revealed that the GLM-based approach (Cundill & Alexander, 2015) works robustly. The nonparametric NECDF approach is dependent on pilot data and can provide significant underestimations of the required sample sizes. Please consult the report linked above for further details.

Sample size determination in the GLM approach for gamma-distributed data:

library(skewsamp)
skewsamp::n_gamma(mean0 = 1, effect = 0.5, shape0 = 1, alpha = 0.05, power = 0.9)
#> Estimated sample size for group difference.
#> Generalized Regression, Gamma Distribution, link: log 
#> 
#> N (total)         87.48 
#> n0 (Group 0)      43.74 
#> n1 (Group 1)      43.74 
#> 
#> Effect size       0.5 
#> Effect type       1 - (mean1/mean0) 
#> Type I error      0.05 
#> Target power      0.9 
#> Two-sided         TRUE 
#> 
#> Call: skewsamp::n_gamma(mean0 = 1, effect = 0.5, shape0 = 1, alpha = 0.05, 
#>     power = 0.9)

Sample size determination in the location shift approach. This approach requires pilot data, which we draw from an exponential distribution for the sake of the example:

library(skewsamp)
skewsamp::n_locshift(s1 = rexp(10), s2 = rexp(10), delta = 0.5, alpha = 0.05, power = 0.9)
#> Estimated sample size for group difference.
#> Wilcoxon-Mann-Whitney Test, Location shift 
#> 
#> N (total)         97.35 
#> n0 (Group 0)      48.68 
#> n1 (Group 1)      48.68 
#> 
#> Effect size       0.5 
#> Effect type       location shift 
#> Type I error      0.05 
#> Target power      0.9 
#> Two-sided         FALSE 
#> 
#> Call: skewsamp::n_locshift(s1 = rexp(10), s2 = rexp(10), delta = 0.5, 
#>     alpha = 0.05, power = 0.9)

References

Cundill, B., & Alexander, N. D. E. (2015). Sample size calculations for skewed distributions. BMC Medical Research Methodology, 15(1), 1–9. https://doi.org/10.1186/s12874-015-0023-0
Chakraborti, S., Hong, B., & Van De Wiel, M. A. (2006). A note on sample size determination for a nonparametric test of location. Technometrics, 48(1), 88–94. https://doi.org/10.1198/004017005000000193