BF: Baringhaus and Franz (2010) rigid motion invariant...

View source: R/BF.R

BFR Documentation

Baringhaus and Franz (2010) rigid motion invariant multivariate two-sample test

Description

The function implements the Baringhaus and Franz (2010) multivariate two-sample test. The implementation here uses the cramer.test implementation from the cramer package.

Usage

BF(X1, X2, n.perm = 0, just.statistic = n.perm <= 0, kernel = "phiLog", 
    sim = "ordinary", maxM = 2^14, K = 160, seed = 42)

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

n.perm

Number of permutations for permutation or Bootstrap test, respectively (default: 0, no permutation test performed)

just.statistic

Should only the test statistic be calculated without performing any test (default: TRUE if number of permutations is set to 0 and FALSE if number of permutations is set to any positive number)

kernel

Name of the kernel function as character. Possible options are "phiLog" (default), "phiFracA", and "phiFracB". Alternatively, a user-defined function can be supplied. The function should allow a matrix as input and fulfill the following properties. The output should be non-negative, the value of 0 should be mapped to 0, and the first derivative should be non-constant completely monotone.

sim

Type of Bootstrap or eigenvalue method for testing. Possible options are "ordinary" (default) for ordinary Boostrap, "permutation" for permutation testing, or "eigenvalue" for bootstrapping the limit distribution (especially good for datasets too large for performing Bootstrapping). For more details see cramer.test

maxM

Maximum number of points used for fast Fourier transform involved in eigenvalue method for approximating the null distribution (default: 2^14). Ignored if sim is either "ordinary" or "permutation". For more details see cramer.test.

K

Upper value up to which the integral for calculating the distribution function from the characteristic function is evaluated (default: 160). Note: when K is increased, it is necessary to also increase maxM. Ignored if sim is either "ordinary" or "permutation". For more details see cramer.test.

seed

Random seed (default: 42)

Details

The Bahrinhaus and Franz (2010) test statistic

T_{n_1, n_2} = \frac{n_1 n_2}{n_1+n_2}\left(\frac{2}{n_1 n_2}\sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \phi(||X_{1i} - X_{2j}||^2) - \frac{1}{n_1^2}\sum_{i,j=1}^{n_1} \phi(||X_{1i} - X_{1j}||^2) - \frac{1}{n_2^2}\sum_{i,j=1}^{n_2} \phi(||X_{2i} - X_{2j}||^2)\right)

is defined using a kernel function \phi. A choice recommended preferably for location alternatives is

\phi_{\text{log}}(x) = \log(1 + x),

two choices recommended preferably for dispersion alternatives are

\phi_{\text{FracA}}(x) = 1 - \frac{1}{1+x}

and

\phi_{\text{FracB}}(x) = 1 - \frac{1}{(1+x)^2}.

The theoretical statistic underlying this test statistic is zero if and only if the distributions coincide. Therefore, low values of the test statistic incidate similarity of the datasets while high values indicate differences between the datasets.

This implementation is a wrapper function around the function cramer.test that modifies the in- and output of that function to match the other functions provided in this package. For more details see cramer.test.

Value

An object of class htest with the following components:

method

Description of the test

d

Number of variables in each dataset

m

Sample size of first dataset

n

Sample size of second dataset

statistic

Observed value of the test statistic

p.value

Boostrap/ permutation p value (only if n.perm > 0)

sim

Type of Boostrap or eigenvalue method (only if n.perm > 0)

n.perm

Number of permutations for permutation or Boostrap test

hypdist

Distribution function under the null hypothesis reconstructed via fast Fourier transform. $x contains the x-values, $Fx contains the corresponding distribution function values. (only if n.perm > 0)

ev

Eigenvalues and eigenfunctions when using the eigenvalue method (only if n.perm > 0)

data.name

The dataset names

alternative

The alternative hypothesis

Applicability

Target variable? Numeric? Categorical? K-sample?
No Yes No No

References

Baringhaus, L. and Franz, C. (2010). Rigid motion invariant two-sample tests, Statistica Sinica 20, 1333-1361

Franz, C. (2024). cramer: Multivariate Nonparametric Cramer-Test for the Two-Sample-Problem. R package version 0.9-4, https://CRAN.R-project.org/package=cramer.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

See Also

Bahr, Cramer, Energy

Examples

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Baringhaus and Franz test 
if(requireNamespace("cramer", quietly = TRUE)) {
  BF(X1, X2, n.perm = 100)
  BF(X1, X2, n.perm = 100, kernel = "phiFracA")
  BF(X1, X2, n.perm = 100, kernel = "phiFracB")
}

DataSimilarity documentation built on April 3, 2025, 9:39 p.m.