waddR-package | R Documentation |

The package offers statistical tests based on the 2-Wasserstein distance for detecting and characterizing differences between two distributions given in the form of samples. Functions for calculating the 2-Wasserstein distance and testing for differential distributions are provided, as well as a specifically tailored test for differential expression in single-cell RNA sequencing data.

The `waddR`

package provides tools to address the following tasks:

Computation of the 2-Wasserstein distance

Two-sample tests to check for differences between two distributions

Detection of differential gene expression distributions in single-cell RNA sequencing data

The 2-Wasserstein distance is a
metric to quantify the difference between two distributions, representing e.g.
two different conditions `A`

and `B`

. The `waddR`

package specifically considers the
squared 2-Wasserstein distance which can be decomposed into
location, size, and shape terms, thus providing a characterization of potential differences. It offers three functions to calculate
the (squared) 2-Wasserstein distance, which are implemented in C++ and
exported to R with Rcpp for faster computation. `wasserstein_metric`

is a C++ reimplementation of the `wasserstein1d`

function from the R package
`transport`

. The functions
`squared_wass_approx`

and `squared_wass_decomp`

compute
approximations of the squared 2-Wasserstein distance, with
`squared_wass_decomp`

also returning the decomposition terms for
location, size, and shape.

See `?wasserstein_metric`

,
`?squared_wass_aprox`

, and `?squared_wass_decomp`

as well as the
accompanying paper Schefzik et al. (2020).

The `waddR`

package provides two testing procedures
using the 2-Wasserstein distance to test whether two distributions `F_A`

and
`F_B`

given in the form of samples are different by testing the
null hypothesis `H_0: F_A = F_B`

against the alternative hypothesis ```
H_1: F_A
\neq F_B
```

.

The first, semi-parametric (SP), procedure uses a permutation-based test combined with a generalized Pareto distribution approximation to estimate small p-values accurately.

The second procedure uses a test based on asymptotic theory (ASY) which is valid only if the samples can be assumed to come from continuous distributions.

See `?wasserstein.test`

for more
details.

The `waddR`

package provides an adaptation of the
semi-parametric testing procedure based on the 2-Wasserstein distance
which is specifically tailored to identify differential distributions in scRNA-seq data. In particular, a two-stage
(TS) approach is implemented that takes account of the specific
nature of scRNA-seq data by separately testing for differential
proportions of zero gene expression (using a logistic regression model)
and differences in non-zero gene expression (using the semi-parametric
2-Wasserstein distance-based test) between two conditions.

See `?wasserstein.sc`

and `?testZeroes`

for more details.

Schefzik, R., Flesch, J., and Goncalves, A. (2020). waddR: Using the 2-Wasserstein distance to identify differences between distributions in two-sample testing, with application to single-cell RNA-sequencing data.

**Maintainer**: Julian Flesch julianflesch@gmail.com

Authors:

Roman Schefzik roman.schefzik@medma.uni-heidelberg.de

Useful links:

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.