README.md
In holub008/nymph: Randomization tests for nonparametric inference

nymph

Randomization tests for noNparaMetric INFerence

devtools::install_git('https://github.com/holub008/nymph')

Randomization methods provide a powerful toolset for performing inference on a wide range of statistics with few distributional constraints on the data. I believe these methods are underutilized, partially as a result of poor library support; existing R packages (e.g. coin & perm) provide a narrow range of functionality or obfuscate the underlying simplicity of the methods from the user. nymph aims to provide the practitioner a simple & generic interface to this class of methods.

Using the iris dataset, we'd like to investigate if the ratio of length to width of Virginica flower petals are different from that of Virginica sepals. We use the mcrd function to compute ratio differences between petals and sepals.

library(dplyr)

petal_measurements <- iris %>% 
    filter(Species == 'virginica') %>% 
    mutate(len = Petal.Length, 
           width = Petal.Width) %>%
    select(len, width)
sepal_measurements <- iris %>% 
    filter(Species == 'virginica') %>% 
    mutate(len = Sepal.Length,
           width = Sepal.Width) %>%
    select(len, width)

set.seed(55414)
mcrt <- mcrd_test(petal_measurements, sepal_measurements, 
                  lw_ratio = mean(len / width),
                  length_proportion = mean(len / (len + width)))
summary(mcrt)

with results:

Permutation test of 1000 permutations against alternative of two.sided at significance 0.95 
         statistic    ci_lower   ci_upper     actual p_value
          lw_ratio -0.16575289 0.17387946 0.55020960       0
 length_proportion -0.01267495 0.01315527 0.04388741       0

Quite convincing that a difference exists! For a visualization of the observed statistic differences (blue) against their null distributions:

plot(mcrt)

iris_mcrt

Here we perform a power analysis of a contrived experiment - a two treatment experiment with a minimally impactful effect size of 1 against otherwise standard normal populations.

gen_data_s1 <- function(){ data.frame(x = rnorm(50)) }
gen_data_s2 <- function(){ data.frame(x = rnorm(50, 1)) }
mcrp <- mcrd_power(gen_data_s1, gen_data_s2, mean = mean(x), median = median(x), 
                    test_trials = 1e2)
summary(mcrp, alpha = .05, alternative = 'two.sided')

With result:

Power analysis of 100 experiments with alternative of two.sided at significance 0.05 
Group sizes:
 sample size
      1   50
      2   50
 statistic power average_effect
      mean     1     -0.9829345
    median  0.99     -0.9878372

We can also visualize the distribution of p-values (from repeated simulation of the experiment):

plot(mcrp, statistic = 'median', alternative = 'two.sided', 
     alpha = NULL)

median_p_dist

Or we can visualize how power varies across desired inferential FPRs:

plot(mcrp, statistic = 'median', alternative = 'two.sided', 
     alpha = seq(.01, .2, by = .01))

median_alpha_v_power

No dependencies
- Portability & ease of install
- Functionality should be transparent to the user
- Suggests the parallel package, which comes standard with R >= 2.14, for accelerated computations
S4 object system
- Ensure consistency & validity across a range of tests
- Fewer redundant objects for the user to comprehend
Non-standard evaluation wrappers for common use cases
- Reduce boilerplate for interactive use