`waddR`

is an R package that provides statistical tests based on the 2-Wasserstein distance for detecting and characterizing differences between two distributions given in the form of samples. Functions for calculating the 2-Wasserstein distance and testing for differential distributions are provided, as well as specifically tailored test for differential expression in single-cell RNA sequencing data.

The package provides tools to address the following tasks: 1. Computation of the 2-Wasserstein distance 2. Two-sample tests to check for differences between two distributions 3. Detection of differential gene expression distributions in single-cell RNA sequencing data

- R >= 3.6.0

Available on Bioconductor:

```
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("waddR")
```

The latest package version can be installed from Github using `BiocManager`

:

```
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install("goncalves-lab/waddR")
```

Tests can be run by calling `test()`

from the `devtools`

package.
All tests are implemented using the `testthat`

package and reside in `tests/testhat`

`waddR`

The 2-Wasserstein distance is a metric to describe the distance between two distributions, representing two diferent conditions A and B. This package specifically considers the squared 2-Wasserstein distance d := W^2 which offers a decomposition into location, size, and shape terms.

The package `waddR`

offers three functions to calculate the 2-Wasserstein
distance, all of which are implemented in Cpp and exported to R with Rcpp for
better performance.
The function `wasserstein_metric`

is a Cpp reimplementation of the
function `wasserstein1d`

from the package `transport`

and offers the most exact
results.
The functions `squared_wass_approx`

and `squared_wass_decomp`

compute
approximations of the squared 2-Wasserstein distance with `squared_wass_decomp`

also returning the decomosition terms for location, size, and shape.
See `?wasserstein_metric`

, `?squared_wass_aprox`

, and `?squared_wass_decomp`

.

This package provides two testing procedures using the 2-Wasserstein distance to test whether two distributions F_A and F_B given in the form of samples are different ba specifically testing the null hypothesis H0: F_A = F_B against the alternative hypothesis H1: F_A != F_B.

The first, semi-parametric (SP), procedure uses a test based on permutations combined with a generalized pareto distribution approximation to estimate small pvalues accurately.

The second procedure (ASY) uses a test based on asymptotic theory which is valid only if the samples can be assumed to come from continuous distributions.

See the documentation of the function \code{?wasserstein.test} for more details.

The waddR package provides an adaptation of the semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in single-cell RNA-seqencing (scRNA-seq) data. In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions.

See the documentation of the Single Cell testing function `?wasserstein.sc`

and the test for zero expression levels `?testZeroes`

for more details.

We have included detailed examples of how to use all functions provided with
`waddR`

in our vignettes.
They are available online here
*(update this link once it is final)* or from an R session with the
following command:
`browseVignettes("waddR")`

Schefzik, R., Flesch, J., and Goncalves, A. (2019). waddR: Using the 2-Wasserstein distance to identify differences between distributions in two-sample testing, with application to single-cell RNA-sequencing data.

