# wasserstein.test: wasserstein.test In waddR: Statistical tests for detecting differential distributions based on the 2-Wasserstein distance

## Description

Two-sample test to check for differences between two distributions (conditions) using the 2-Wasserstein distance, either using the semi-parametric permutation testing procedure with GPD approximation to estimate small p-values accurately or the test based on asymptotic theory

## Usage

 `1` ```wasserstein.test(x, y, method = c("SP", "ASY"), permnum = 10000) ```

## Arguments

 `x` univariate sample (vector) representing the distribution of condition A `y` univariate sample (vector) representing the distribution of condition B `method` testing procedure to be employed: "SP" for the semi-parametric permutation testing procedure with GPD approximation to estimate small p-values accurately; "ASY" for the test based on asymptotic theory. If no method is given, "SP" will be used by default. `permnum` number of permutations used in the permutation testing procedure (if method=<80><9d>SP<80><9d> is performed); default is 10000

## Details

Details concerning the two testing procedures (i.e. the permutation testing procedure with GPD approximation to estimate small p-values accurately and the test based on asymptotic theory) can be found in Schefzik and Goncalves (2019).

## Value

A vector concerning the testing results (see Schefzik and Goncalves (2019) for details).

A vector concerning the testing results, precisely (see Schefzik and Goncalves (2019) for details)

• d.wass: 2-Wasserstein distance between the two samples computed by quantile approximation

• d.wass^2: squared 2-Wasserstein distance between the two samples computed by quantile approximation

• d.comp^2: squared 2-Wasserstein distance between the two samples computed by decomposition approximation

• d.comp: 2-Wasserstein distance between the two samples computed by decomposition approximation

• location: location term in the decomposition of the squared 2-Wasserstein distance between the two samples

• size: size term in the decomposition of the squared 2-Wasserstein distance between the two samples

• shape: shape term in the decomposition of the squared 2-Wasserstein distance between the two samples

• rho: correlation coefficient in the quantile-quantile plot

• pval: The p-value of the semi-parametric 2-Wasserstein distance-based test or p-value determined using asymptotic theory, depending on the method

• p.ad.gpd: in case the GPD fitting is performed: p-value of the Anderson-Darling test to check whether the GPD actually fits the data well (otherwise NA). This output is only returned when performing a semi-parametric test (method="SP")!

• N.exc: in case the GPD fitting is performed: number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05) (otherwise NA). This output is only returned when performing a semi-parametric test (method="SP")!

• perc.loc: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

• perc.size: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

• perc.shape: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

• decomp.error: relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation

## References

Schefzik, R. and Goncalves, A. (2019).

## Examples

 ```1 2 3 4 5 6 7 8 9``` ```# generate two input distributions x<-rnorm(500) y<-rnorm(500,4,1.5) wasserstein.test(x,y,method="ASY") # Run with default options: method="SP", permnum=10000 wasserstein.test(x,y) # Run with a seed for the semi-parametric test ("SP") set.seed(42) wasserstein.test(x,y, method="SP") ```

waddR documentation built on Nov. 8, 2020, 8:32 p.m.