Description Details Wasserstein Distance functions Two-Sample Testing Single Cell Test Author(s) See Also

Wasserstein distance based statistical test for detecting and describing differential distributions in one-dimensional data. Functions for wasserstein distance calculation, differential distribution testing, and a specialized test for differential expression in scRNA data are provided.

The Wasserstein package offers utilities for three distinct use cases:

Computation of the 2-Wasserstein distance

Two-sample test to check for differences between two distributions

Detect differential gene expression distributions in scRNAseq data

The 2-Wasserstein distance is a
metric to describe the distance between two distributions, representing
two diferent conditions A and B. This package specifically considers the
squared 2-Wasserstein distance d := W^2 which offers a decomposition into
location, size, and shape terms. It offers three functions to calculate
the 2-Wasserstein distance, all of which are implemented in Cpp and
exported to R with Rcpp for better performance. `wasserstein_metric`

is a Cpp reimplementation of the wasserstein1d method from the package
`transport`

and offers the most exact results. The functions
`squared_wass_approx`

and `squared_wass_decomp`

compute
approximations of the squared 2-Wasserstein distance with
`suared_wass_decomp`

also returning the decomosition terms for
location, size, and shape. See `?wasserstein_metric`

,
`?squared_wass_aprox`

, and `?squared_wass_decomp`

as well as the
accompanying paper Schefzik and Goncalves 2019.

This package provides two testing procedures using the 2-Wasserstein distance to test whether two distributions F_A and F_B given in the form of samples are different ba specifically testing the null hypothesis H0: F_A = F_B against the alternative hypothesis H1: F_A != F_B.

The first, semi-parametric (SP), procedure uses a test based on permutations combined with a generalized pareto distribution approximation to estimate small pvalues accurately.

The second procedure (ASY) uses a test based on asymptotic theory which is valid only if the samples can be assumed to come from continuous distributions.

See the documentation of the function `?wasserstein.test`

for more
details.

The waddR package provides an adaptation of the semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in single-cell RNA-seqencing (scRNA-seq) data. In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions.

See the documentation of the Single Cell testing function
`?wasserstein.sc`

and the test for zero expression levels
`?testZeroes`

for more details.

**Maintainer**: Julian Flesch julianflesch@gmail.com

Authors:

Roman Schefzik r.schefzik@dkfz-heidelberg.de

Useful links:

Report bugs at https://github.com/goncalves-lab/waddR/issues

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.