# dot-wassersteinTestSp: Semi-parametric test using the 2-Wasserstein distance to... In goncalves-lab/waddR: Statistical tests for detecting differential distributions based on the 2-Wasserstein distance

 .wassersteinTestSp R Documentation

## Semi-parametric test using the 2-Wasserstein distance to check for differential distributions

### Description

Two-sample test to check for differences between two distributions using the 2-Wasserstein distance: Semi-parametric implementation using a permutation test with a generalized Pareto distribution (GPD) approximation to estimate small p-values accurately

### Usage

``````.wassersteinTestSp(x, y, permnum = 10000)
``````

### Arguments

 `x` sample (vector) representing the distribution of condition `A` `y` sample (vector) representing the distribution of condition `B` `permnum` number of permutations used in the permutation testing procedure

### Details

This is the semi-parametric version of `wasserstein.test`, for the asymptotic theory-based procedure see `.wassersteinTestAsy`.

Details concerning the permutation testing procedure with GPD approximation to estimate small p-values accurately can be found in Schefzik et al. (2020).

### Value

A vector of 15, see Schefzik et al. (2020) for details:

• d.wass: 2-Wasserstein distance between the two samples computed by quantile approximation

• d.wass^2: squared 2-Wasserstein distance between the two samples computed by quantile approximation

• d.comp^2: squared 2-Wasserstein distance between the two samples computed by decomposition approximation

• d.comp: 2-Wasserstein distance between the two samples computed by decomposition approximation

• location: location term in the decomposition of the squared 2-Wasserstein distance between the two samples

• size: size term in the decomposition of the squared 2-Wasserstein distance between the two samples

• shape: shape term in the decomposition of the squared 2-Wasserstein distance between the two samples

• rho: correlation coefficient in the quantile-quantile plot

• pval: p-value of the semi-parametric 2-Wasserstein distance-based test

• p.ad.gpd: in case the GPD fitting is performed: p-value of the Anderson-Darling test to check whether the GPD actually fits the data well (otherwise NA).

• N.exc: in case the GPD fitting is performed: number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit, i.e. p-value of Anderson-Darling test `\geq 0.05` (otherwise NA).

• perc.loc: fraction (in %) of the location part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation

• perc.size: fraction (in %) of the size part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation

• perc.shape: fraction (in %) of the shape part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation

• decomp.error: relative error between the squared 2-Wasserstein distance obtained by the quantile approximation and the squared 2-Wasserstein distance obtained by the decomposition approximation

### References

Schefzik, R., Flesch, J., and Goncalves, A. (2020). waddR: Using the 2-Wasserstein distance to identify differences between distributions in two-sample testing, with application to single-cell RNA-sequencing data.

goncalves-lab/waddR documentation built on June 29, 2023, 12:18 a.m.