Wasserstein | R Documentation |
Performs a permutation two-sample test based on the Wasserstein distance. The implementation here uses the wasserstein_permut
implementation from the Ecume package.
Wasserstein(X1, X2, n.perm = 0, fast = (nrow(X1) + nrow(X2)) > 1000,
S = max(1000, (nrow(X1) + nrow(X2))/2), seed = 42, ...)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
n.perm |
Number of permutations for permutation test (default: 0, no test is performed). |
fast |
Should the |
S |
Number of samples to use for approximation if |
seed |
Random seed (default: 42) |
... |
Other parameters passed to |
A permutation test for the p
-Wasserstein distance is performed. By default, the 1-Wasserstein distance is calculated using Euclidean distances. The p
-Wasserstein distance between two probability measures \mu
and \nu
on a Euclidean space M
is defined as
W_p(\mu, \nu) = \left(\inf_{\gamma\in\Gamma(\mu,\nu)}\int_{M\times M} ||x - y||^p \text{d} \gamma(x, y)\right)^{\frac{1}{p}},
where \Gamma(\mu,\nu)
is the set of probability measures on M\times M
such that \mu
and \nu
are the marginal distributions.
As the Wasserstein distance of two distributions is a metric, it is zero if and only if the distributions coincides. Therefore, low values of the statistic indicate similarity of the datasets and the test rejects for high values.
This implementation is a wrapper function around the function wasserstein_permut
that modifies the in- and output of that function to match the other functions provided in this package. For more details see the wasserstein_permut
.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
p.value |
Asymptotic p value |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. John Wiley & Sons, Chichester.
Roux de Bezieux, H. (2021). Ecume: Equality of 2 (or k
) Continuous Univariate and Multivariate Distributions. R package version 0.9.1, https://CRAN.R-project.org/package=Ecume
Schuhmacher, D., Bähre, B., Gottschlich, C., Hartmann, V., Heinemann, F., Schmitzer, B. and Schrieber, J. (2019). transport: Computation of Optimal Transport Plans and Wasserstein Distances. R package version 0.15-0. https://cran.r-project.org/package=transport
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Wasserstein distance based test
if(requireNamespace("Ecume", quietly = TRUE)) {
Wasserstein(X1, X2, n.perm = 100)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.