knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(Sampling)

Permutation

The permutation function provides a way of easily doing a permutation test of significance between observations of two groups.

Theory behind permutation

A permutation test is a non-parametric test that by resampling provides a way of calculating the p-value as a measure of significance of the difference between two groups. The difference can be measured in multiple ways, most commenly the difference in means between two groups are used.

The permutation test is done by a series of steps:

1. Calculate the observed test-statistic, e.g. the observed difference in means between the two groups.

2. The permutation loop:
    2a. Shuffle all the group labels so they get a new observation. The resampling is done without replacement so the permuted data consist of the same number of observations as the original data, just shuffled. 
    2b. Calculate the permuted test-statistic, e.g. the difference in mean between the two groups in the permuted data. 
    2c. Save the permuted test-statistic in a list and continue the permutation loop, e.g. 10^5 times.

3. After end permutation loop the permuted null distribution of the test-statistic is obtained and the permuted p-value can be calculated like this:

If the observed test-stastistic > 0:

$$ p-value = \frac{\sum(\text{permuted test-statistic} >= \text{observed test-statistic})}{\text{# permutations}} \cdot 2 $$

If the observed test-statistic < 0:

$$ p-value = \frac{\sum(\text{permuted test-statistic} <= \text{observed test-statistic})}{\text{# permutations}} \cdot 2 $$

The p-value is multiplied by 2 to make it a two-sided permutation test (taking both ends of the null distribution into account).

The permutation() function

The function provides several number of options for the calculation of the test-statistic, where the most used ones (i.e. difference in mean and difference in median) are easily accessible as options to the argument "method". Further more a user specified function, "my_method()", can be given as argument to the "method" argument as well, making it very flexible.

Given observations and information about the group, the function returns an output.

The output is of class "permutation" and of class "htest". On these objects both summary() and plot() functions can be called, together with all other functions supplying the "htest" class, e.g. print().

The plot() outputs two plots:

These two plots helps the user get a better intuition of the permutation output and the resulting p-value.

If the ggplot2 package is a part of the users installed packages, ggplots are made. If not, the plots will be made as base R plots. The user is strongly encouraged to install ggplot2 if not already done to get a better experience.

Usage

Input

Output

An object of class "permutation" and "htest" containing the following:

Example

The BloodPressure dataset provided in the package to try out the permutation function, can be fetched to your local environment by running data(BloodPressure).

Syntax

perm <- permutation(group = BloodPressure$Group, 
                    observations = BloodPressure$Blood_pressure, 
                    method = "mean", nPerm = 10^5)

Example output

summary(perm)

plot(perm)

Bootstrap

The bootstrap function provides a way of easily doing a bootstrap of a population estimate given observations. The populations estimates found through bootstrap includes SE and confidence intervals.

Theory behind Bootstrap

A bootstrap test is a non-parametric test that by resampling provides a way of getting the sampling distribution, by calculating the estimation for each iteration. This makes it possible to get the Standart error of the popultion.

The permutation test is done by a series of steps:

1. Calculate the observed estimate, e.g. mean or median.

2. The bootstrap loop:
    2a. randomly pick observation with replacement, so the bootstrapped data consist of the same number of observations as the original data. 
    2b. Calculate the bootstrap estimate, e.g. mean or median. 
    2c. Save the bootstrap estimate in a list and continue the bootstrap loop, e.g. 10^5 times.

3. After end bootstrap loop the bootstraped sampling distribution of the estimate is obtained and the Standart error and confidence interval can be calculated like this:

Since the standard error is equal to the Stardart deviation of the sampling distribution, we can calculate the SD of the bootstrap with the sd() function from the R stats package.

Furthermore the confidence interval is found by taking the 0.025 percentile and the 0.975 percentile of the sampling distribution.

The Bootstrap() function

The function provides several number of options for the calculation of the bootstrap estimate, where the most used ones (i.e. mean and median) are easily accessible as options to the argument "method". Further more a user specified function, "my_method()", can be given as argument to the "method" argument as well, making it very flexible.

The output is of class "bootstrap". On this objects both summary() and plot() functions can be called.

The plot() outputs one plot visualizing the sampling distribution of the population estimate found through bootstrapping, including 95 % confidence intervals and the observed population estimate.

If the ggplot2 package is a part of the users installed packages, ggplots are made. If not, the plots will be made as base R plots. The user is strongly encouraged to install ggplot2 if not already done to get a better experience.

Usage

Input

Output

An object of class "permutation" and "htest" containing the following:

Example

The PlantGrowth dataset can be used to try out the bootstrap function.

Syntax

boot <- bootstrap(observations = PlantGrowth$weight, 
                    method = "mean", nboot = 10^5)

Example output

summary(boot)

plot(boot, bins = 30)



aumath-advancedr2019/Sampling documentation built on Nov. 26, 2019, 2:08 a.m.