BG | R Documentation |
The function implements the Biau and Gyorfi (2005) two-sample homogeneity test. This test uses the L_1
-distance between two empicial distribution functions restricted to a finite partition.
BG(X1, X2, partition = rectPartition, exponent = 0.8, eps = 0.01, seed = 42, ...)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame of the same sample size as |
partition |
Function that creates a finite partition for the subspace spanned by the two datasets (default: |
exponent |
Exponent used in the partition function, should be between 0 and 1 (default: 0.8) |
eps |
Small threshold to guarantee edge points are included (default: 0.01) |
seed |
Random seed (default: 42) |
... |
Further arguments to be passed to the partition function |
The Biau and Gyorfi (2005) two-sample homogeneity test is defined for two datasets of the same sample size.
By default a rectangular partition (rectPartition
) is being calculated under the assumption of approximately equal cell probabilities. Use the exponent
argument to choose the number of elements of the partition m_n
accoring to the convergence criteria in Biau and Gyorfi (2005). By default choose m_n = n^{0.8}
. For each of the p
variables of the datasets, create m_n^{1/p} + 1
cutpoints along the range of both datasets to define the partition, and ensure at least three cutpoints exist per variable (min, max, and one point splitting the data into two bins).
The test statistic is the L_1
-distance between the vectors of the proportions of points falling into each cell of the partition for each dataset.
An asymptotic test is performed using a standardized version of the L_1
distance that is approximately standard normally distributed (Corollary to Theorem 2 in Biau and Gyorfi (2005)).
Low values of the test statistic indicate similarity. Therefore, the test rejects for large values of the test statistic.
An object of class htest
with the following components:
statistic |
Observed value of the (asymptotic) test statistic |
p.value |
p value |
method |
Description of the test |
data.name |
The dataset names |
alternative |
The alternative hypothesis |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
Biau G. and Gyorfi, L. (2005). On the asymptotic properties of a nonparametric L_1
-test statistic of homogeneity, IEEE Transactions on Information Theory, 51(11), 3965-3973. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1109/TIT.2005.856979")}
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
rectPartition
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform BG test
BG(X1, X2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.