BallDivergence: Ball Divergence Based Two- or k-sample Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BallDivergence

R Documentation

Ball Divergence Based Two- or `k`-sample Test

Description

The function implements the Pan et al. (2018) multivariate two- or k-sample test based on the Ball Divergence. The implementation here uses the bd.test implementation from the Ball package.

Usage

BallDivergence(X1, X2, ..., n.perm = 0, seed = NULL, num.threads = 0, 
                kbd.type = "sum", weight = c("constant", "variance"), 
                args.bd.test = NULL)

Arguments

`X1`	First dataset as matrix or data.frame
`X2`	Second dataset as matrix or data.frame
`...`	Optionally more datasets as matrices or data.frames
`n.perm`	Number of permutations for permutation test (default: 0, no permutation test performed). Note that for more than two samples, no test is performed.
`seed`	Random seed (default: NULL). A random seed will only be set if one is provided.
`num.threads`	Number of threads (default: 0, all available cores are used)
`kbd.type`	Character specifying which k-sample test statistic will be used. Must be one of `"sum"` (default), `"maxsum"`, or `"max"`.
`weight`	Character specifying the weight form of the Ball Divergence test statistic. Must be one of `"constant"` (default) or `"variance"`.
`args.bd.test`	Further arguments passed to `bd.test` as a named list.

Details

For n.perm = 0, the asymptotic test is performed. For n.perm > 0, a permutation test is performed.

The Ball Divergence is defined as the square of the measure difference over a given closed ball collection. The empirical test performed here is based on the difference between averages of metric ranks. It is robust to outliers and heavy-tailed data and suitable for imbalanced sample sizes.

The Ball Divergence of two distributions is zero if and only if the distributions coincide. Therefore, low values of the test statistic indicate similarity and the test rejects for large values of the test statistic.

For the k-sample problem the pairwise Ball divergences can be summarized in different ways. First, one can simply sum up all pairwise Ball divergences (kbd.type = "sum"). Next, one can find the sample with the largest difference to the other, i.e. take the maximum of the sums of all Ball divergences for each sample with all other samples (kbd.type = "maxsum"). Last, one can sum up the largest k-1 pairwise Ball divergences (kbd.type = "max").

This implementation is a wrapper function around the function bd.test that modifies the in- and output of that function to match the other functions provided in this package. For more details see bd.test and bd.

Value

An object of class htest with the following components:

`statistic`	Observed value of the test statistic
`p.value`	Permutation p value (only if `n.perm` > 0 and for two datasets)
`n.perm`	Number of permutations for permutation test
`size`	Number of observations for each dataset
`method`	Description of the test
`data.name`	The dataset names
`alternative`	The alternative hypothesis

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	Yes

References

Pan, W., T. Y. Tian, X. Wang, H. Zhang (2018). Ball Divergence: Nonparametric two sample test, Annals of Statistics 46(3), 1109-1137, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/17-AOS1579")}.

J. Zhu, W. Pan, W. Zheng, and X. Wang (2021). Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces, Journal of Statistical Software, 97(6), \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v097.i06")}

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

Examples

set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate Ball Divergence and perform test 
if(requireNamespace("Ball", quietly = TRUE)) {
  BallDivergence(X1, X2, n.perm = 100)
}

DataSimilarity documentation built on June 16, 2025, 5:08 p.m.