The package offers statistical tests based on the 2-Wasserstein distance for detecting and characterizing differences between two distributions given in the form of samples. Functions for calculating the 2-Wasserstein distance and testing for differential distributions are provided, as well as a specifically tailored test for differential expression in single-cell RNA sequencing data.
waddR package provides tools to address the following tasks:
Computation of the 2-Wasserstein distance
Two-sample tests to check for differences between two distributions
Detection of differential gene expression distributions in single-cell RNA sequencing data
The 2-Wasserstein distance is a
metric to quantify the difference between two distributions, representing e.g.
two different conditions
waddR package specifically considers the
squared 2-Wasserstein distance which can be decomposed into
location, size, and shape terms, thus providing a characterization of potential differences. It offers three functions to calculate
the (squared) 2-Wasserstein distance, which are implemented in C++ and
exported to R with Rcpp for faster computation.
is a C++ reimplementation of the
wasserstein1d function from the R package
transport. The functions
approximations of the squared 2-Wasserstein distance, with
squared_wass_decomp also returning the decomposition terms for
location, size, and shape.
?squared_wass_decomp as well as the
accompanying paper Schefzik et al. (2020).
waddR package provides two testing procedures
using the 2-Wasserstein distance to test whether two distributions
F_B given in the form of samples are different by testing the
H_0: F_A = F_B against the alternative hypothesis
The first, semi-parametric (SP), procedure uses a permutation-based test combined with a generalized Pareto distribution approximation to estimate small p-values accurately.
The second procedure uses a test based on asymptotic theory (ASY) which is valid only if the samples can be assumed to come from continuous distributions.
?wasserstein.test for more
waddR package provides an adaptation of the
semi-parametric testing procedure based on the 2-Wasserstein distance
which is specifically tailored to identify differential distributions in scRNA-seq data. In particular, a two-stage
(TS) approach is implemented that takes account of the specific
nature of scRNA-seq data by separately testing for differential
proportions of zero gene expression (using a logistic regression model)
and differences in non-zero gene expression (using the semi-parametric
2-Wasserstein distance-based test) between two conditions.
?testZeroes for more details.
Schefzik, R., Flesch, J., and Goncalves, A. (2020). waddR: Using the 2-Wasserstein distance to identify differences between distributions in two-sample testing, with application to single-cell RNA-sequencing data.
Maintainer: Julian Flesch firstname.lastname@example.org
Roman Schefzik email@example.com
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.