BF | R Documentation |
The function implements the Baringhaus and Franz (2010) multivariate two-sample test. The implementation here uses the cramer.test
implementation from the cramer package.
BF(X1, X2, n.perm = 0, just.statistic = n.perm <= 0, kernel = "phiLog",
sim = "ordinary", maxM = 2^14, K = 160, seed = 42)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
n.perm |
Number of permutations for permutation or Bootstrap test, respectively (default: 0, no permutation test performed) |
just.statistic |
Should only the test statistic be calculated without performing any test (default: |
kernel |
Name of the kernel function as character. Possible options are |
sim |
Type of Bootstrap or eigenvalue method for testing. Possible options are |
maxM |
Maximum number of points used for fast Fourier transform involved in eigenvalue method for approximating the null distribution (default: 2^14). Ignored if |
K |
Upper value up to which the integral for calculating the distribution function from the characteristic function is evaluated (default: 160). Note: when |
seed |
Random seed (default: 42) |
The Bahrinhaus and Franz (2010) test statistic
T_{n_1, n_2} = \frac{n_1 n_2}{n_1+n_2}\left(\frac{2}{n_1 n_2}\sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \phi(||X_{1i} - X_{2j}||^2) - \frac{1}{n_1^2}\sum_{i,j=1}^{n_1} \phi(||X_{1i} - X_{1j}||^2) - \frac{1}{n_2^2}\sum_{i,j=1}^{n_2} \phi(||X_{2i} - X_{2j}||^2)\right)
is defined using a kernel function \phi
. A choice recommended preferably for location alternatives is
\phi_{\text{log}}(x) = \log(1 + x),
two choices recommended preferably for dispersion alternatives are
\phi_{\text{FracA}}(x) = 1 - \frac{1}{1+x}
and
\phi_{\text{FracB}}(x) = 1 - \frac{1}{(1+x)^2}.
The theoretical statistic underlying this test statistic is zero if and only if the distributions coincide. Therefore, low values of the test statistic incidate similarity of the datasets while high values indicate differences between the datasets.
This implementation is a wrapper function around the function cramer.test
that modifies the in- and output of that function to match the other functions provided in this package. For more details see cramer.test
.
An object of class htest
with the following components:
method |
Description of the test |
d |
Number of variables in each dataset |
m |
Sample size of first dataset |
n |
Sample size of second dataset |
statistic |
Observed value of the test statistic |
p.value |
Boostrap/ permutation p value (only if |
sim |
Type of Boostrap or eigenvalue method (only if |
n.perm |
Number of permutations for permutation or Boostrap test |
hypdist |
Distribution function under the null hypothesis reconstructed via fast Fourier transform. |
ev |
Eigenvalues and eigenfunctions when using the eigenvalue method (only if |
data.name |
The dataset names |
alternative |
The alternative hypothesis |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
Baringhaus, L. and Franz, C. (2010). Rigid motion invariant two-sample tests, Statistica Sinica 20, 1333-1361
Franz, C. (2024). cramer: Multivariate Nonparametric Cramer-Test for the Two-Sample-Problem. R package version 0.9-4, https://CRAN.R-project.org/package=cramer.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
Bahr
, Cramer
, Energy
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Baringhaus and Franz test
if(requireNamespace("cramer", quietly = TRUE)) {
BF(X1, X2, n.perm = 100)
BF(X1, X2, n.perm = 100, kernel = "phiFracA")
BF(X1, X2, n.perm = 100, kernel = "phiFracB")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.