BallDivergence | R Documentation |
k
-sample test
The function implements the Pan et al. (2018) multivariate two- or k
-sample test based on the Ball Divergence. The implementation here uses the bd.test
implementation from the Ball package.
BallDivergence(X1, X2, ..., n.perm = 0, seed = 42, num.threads = 0,
kbd.type = "sum", weight = c("constant", "variance"),
args.bd.test = NULL)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
... |
Optionally more datasets as matrices or data.frames |
n.perm |
Number of permutations for permutation test (default: 0, no permutation test performed). Note that for more than two samples, no test is performed. |
seed |
Random seed (default: 42) |
num.threads |
Number of threads (default: 0, all available cores are used) |
kbd.type |
Character specifying which k-sample test statistic will be used. Must be one of |
weight |
Character specifying the weight form of the Ball Divergence test statistic. Must be one of |
args.bd.test |
Further arguments passed to |
For n.perm = 0
, the asymptotic test is performed. For n.perm > 0
, a permutation test is performed.
The Ball Divergence is defined as the square of the measure difference over a given closed ball collection. The empirical test performed here is based on the difference between averages of metric ranks. It is robust to outliers and heavy-tailed data and suitable for imbalanced sample sizes.
The Ball Divergence of two distributions is zero if and only if the distributions coincide. Therefore, low values of the test statistic indicate similarity and the test rejects for large values of the test statistic.
For the k
-sample problem the pairwise Ball divergences can be summarized in different ways. First, one can simply sum up all pairwise Ball divergences (kbd.type = "sum"
). Next, one can find the sample with the largest difference to the other, i.e. take the maximum of the sums of all Ball divergences for each sample with all other samples (kbd.type = "maxsum"
). Last, one can sum up the largest k-1
pairwise Ball divergences (kbd.type = "max"
).
This implementation is a wrapper function around the function bd.test
that modifies the in- and output of that function to match the other functions provided in this package. For more details see bd.test
and bd
.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
p.value |
Permutation p value (only if |
n.perm |
Number of permutations for permutation test |
size |
Number of observations for each dataset |
method |
Description of the test |
data.name |
The dataset names |
alternative |
The alternative hypothesis |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | Yes |
Pan, W., T. Y. Tian, X. Wang, H. Zhang (2018). Ball Divergence: Nonparametric two sample test, Annals of Statistics 46(3), 1109-1137, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/17-AOS1579")}.
J. Zhu, W. Pan, W. Zheng, and X. Wang (2021). Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces, Journal of Statistical Software, 97(6), \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v097.i06")}
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate Ball Divergence and perform test
if(requireNamespace("Ball", quietly = TRUE)) {
BallDivergence(X1, X2, n.perm = 100)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.