MMD | R Documentation |
Performs a two-sample test based on the maximum mean discrepancy (MMD) using either, the Rademacher or the asmyptotic bounds or a permutation testing procedure. The implementation adds a permutation test to the kmmd
implementation from the kernlab package.
MMD(X1, X2, n.perm = 0, alpha = 0.05, asymptotic = FALSE, replace = TRUE,
n.times = 150, frac = 1, seed = 42, ...)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
alpha |
Significance level of the test (default: 0.05). Used to calculate asymptotic or Rademacher bound. |
asymptotic |
Should the asymptotic bound be calculated? (default: |
replace |
Should sampling with replacement be used in computation of asymptotic bounds? (default: |
n.times |
Number of repetitions for sampling procedure (default: 150) |
frac |
Fraction of points to sample (default: 1) |
seed |
Random seed (default: 42) |
... |
Further arguments passed to |
For a given kernel function k
an unbiased estimator for MMD^2
is defined as
\widehat{\text{MMD}}^2(\mathcal{H}, X_1, X_2)_{U} = \frac{1}{n_1(n_1-1)}\sum_{i=1}^{n_1}\sum_{\substack{j=1 \\ j\neq i}}^{n_1} k\left(X_{1i}, X_{1j}\right) \\
+ \frac{1}{n_2(n_2-1)}\sum_{i=1}^{n_2}\sum_{\substack{j=1 \\ j\neq i}}^{n_2} k\left(X_{2i}, X_{2j}\right)\\
- \frac{2}{n_1 n_2}\sum_{i=1}^{n_1}\sum_{\substack{j = 1 \\ j\neq i}}^{n_2} k\left(X_{1i}, X_{2j}\right).
Its square root is returned as the statistic here.
The theoretical MMD of two distributions is equal to zero if and only if the two distributions coincide. Therefore, low values indicate similarity of datasets and the test rejects for large values.
The orignal proposal of the test is based on critical values calculated asymptotically or using Rademacher bounds. Here, the option for calculating a permutation p value is added. The Rademacher bound is always returned. Additionally, the asymptotic bound can be returned depending on the value of asymptotic
.
This implementation is a wrapper function around the function kmmd
that modifies the in- and output of that function to match the other functions provided in this package. Moreover, a permutation test is added. For more details see the kmmd
.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
p.value |
Permutation p value |
method |
Description of the test |
data.name |
The dataset names |
alternative |
The alternative hypothesis |
H0 |
Is |
asymp.H0 |
Is |
kernel.fun |
Kernel function used |
Rademacher.bound |
The Rademacher bound |
asymp.bound |
The asymptotic bound |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | When suitable kernel function is passed | No |
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B. and Smola, A. (2006). A Kernel Method for the Two-Sample-Problem. Neural Information Processing Systems 2006, Vancouver. https://papers.neurips.cc/paper/3110-a-kernel-method-for-the-two-sample-problem.pdf
Muandet, K., Fukumizu, K., Sriperumbudur, B. and Schölkopf, B. (2017). Kernel Mean Embedding of Distributions: A Review and Beyond. Foundations and Trends® in Machine Learning, 10(1-2), 1-141. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1561/2200000060")}
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform MMD test
if(requireNamespace("kernlab", quietly = TRUE)) {
MMD(X1, X2, n.perm = 100)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.