BG: Biau and Gyorfi (2005) Two-sample Homogeneity Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

View source: R/BG.R

BG	R Documentation

Biau and Gyorfi (2005) Two-sample Homogeneity Test

Description

The function implements the Biau and Gyorfi (2005) two-sample homogeneity test. This test uses the L_1-distance between two empicial distribution functions restricted to a finite partition.

Usage

BG(X1, X2, partition = rectPartition, exponent = 0.8, eps = 0.01, seed = NULL, ...)

Arguments

`X1`	First dataset as matrix or data.frame
`X2`	Second dataset as matrix or data.frame of the same sample size as `X1`
`partition`	Function that creates a finite partition for the subspace spanned by the two datasets (default: `rectPartition`, see Details)
`exponent`	Exponent used in the partition function, should be between 0 and 1 (default: 0.8)
`eps`	Small threshold to guarantee edge points are included (default: 0.01)
`seed`	Random seed (default: NULL). A random seed will only be set if one is provided.
`...`	Further arguments to be passed to the partition function

Details

The Biau and Gyorfi (2005) two-sample homogeneity test is defined for two datasets of the same sample size.

By default a rectangular partition (rectPartition) is being calculated under the assumption of approximately equal cell probabilities. Use the exponent argument to choose the number of elements of the partition m_n accoring to the convergence criteria in Biau and Gyorfi (2005). By default choose m_n = n^{0.8}. For each of the p variables of the datasets, create m_n^{1/p} + 1 cutpoints along the range of both datasets to define the partition, and ensure at least three cutpoints exist per variable (min, max, and one point splitting the data into two bins).

The test statistic is the L_1-distance between the vectors of the proportions of points falling into each cell of the partition for each dataset. An asymptotic test is performed using a standardized version of the L_1 distance that is approximately standard normally distributed (Corollary to Theorem 2 in Biau and Gyorfi (2005)). Low values of the test statistic indicate similarity. Therefore, the test rejects for large values of the test statistic.

Value

An object of class htest with the following components:

`statistic`	Observed value of the (asymptotic) test statistic
`p.value`	p value
`method`	Description of the test
`data.name`	The dataset names
`alternative`	The alternative hypothesis

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

References

Biau G. and Gyorfi, L. (2005). On the asymptotic properties of a nonparametric L_1-test statistic of homogeneity, IEEE Transactions on Information Theory, 51(11), 3965-3973. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1109/TIT.2005.856979")}

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

Examples

set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform BG test 
BG(X1, X2)

DataSimilarity documentation built on June 16, 2025, 5:08 p.m.

DataSimilarity index

Package overview Details on methods and implementations Getting Started with DataSimilarity

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DataSimilarity
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BG: Biau and Gyorfi (2005) Two-sample Homogeneity Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

Biau and Gyorfi (2005) Two-sample Homogeneity Test

Description

Usage

Arguments

Details

Value

Applicability

References

See Also

Examples

Related to BG in DataSimilarity...

R Package Documentation

Browse R Packages

We want your feedback!

DataSimilarity Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BG: Biau and Gyorfi (2005) Two-sample Homogeneity Test In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

Biau and Gyorfi (2005) Two-sample Homogeneity Test

Description

Usage

Arguments

Details

Value

Applicability

References

See Also

Examples

Related to BG in DataSimilarity...

R Package Documentation

Browse R Packages

We want your feedback!

DataSimilarity
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BG: Biau and Gyorfi (2005) Two-sample Homogeneity Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing