BMG: Biswas et al. (2014) Two-sample Runs Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

View source: R/BMG.R

BMG	R Documentation

Biswas et al. (2014) Two-sample Runs Test

Description

The function implements the Biswas, Mukhopadhyay and Gosh (2014) distribution-free two-sample runs test. This test uses a heuristic approach to calculate the shortest Hamilton path between the two datasets using the HamiltonPath function. By default the asymptotic version of the test is calculated.

Usage

BMG(X1, X2, seed = NULL, asymptotic = TRUE)

Arguments

`X1`	First dataset as matrix or data.frame
`X2`	Second dataset as matrix or data.frame
`seed`	Random seed (default: NULL). A random seed will only be set if one is provided.
`asymptotic`	Should the asymptotic version of the test be performed (default: `TRUE`)

Details

The test counts the number of edges in the shortest Hamilton path calculated on the pooled sample that connect points from different samples, i.e.

T_{m,n} = 1 + \sum_{i = 1}^{N-1} U_i,

where U_i is an indicator function with U_i = 1 if the ith edge connects points from different samples and U_i = 0 otherwise.

For a combined sample size N smaller or equal to 1030, the exact version of the Biswas, Mukhopadhyay and Gosh (2014) test can be calculated. It uses the univariate run statistic (Wald and Wolfowitz, 1940) to calculate the test statistic. For N larger than 1030, the calculation for the exact version breaks.

If an asymptotic test is performed the asymptotic null distribution is given by

T_{m, n}^{*} \sim \mathcal{N}(0, 4\lambda^2(1-\lambda)^2)

where T_{m, n}^{*}= \sqrt{N} (T_{m, n} / N - 2 \lambda (1 - \lambda)) the asymptotic test statistic, \lambda = m/N and m is the sample size of the first dataset. Therefore, low absolute values of the asymptotic test statistic indicate similarity of the two datasets whereas high absolute values indicate differences between the datasets.

Value

An object of class htest with the following components:

`statistic`	Observed value of the test statistic (note: this is not the asymptotic test statistic)
`p.value`	(asymptotic) p value
`method`	Description of the test
`data.name`	The dataset names
`alternative`	The alternative hypothesis

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

References

Biswas, M., Mukhopadhyay, M. and Ghosh, A. K. (2014). A distribution-free two-sample run test applicable to high-dimensional data, Biometrika 101 (4), 913-926, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/asu045")}

Wald, A. and Wolfowitz, J. (1940). On a test whether two samples are from the same distribution, Annals of Mathematical Statistic 11, 147-162

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

Examples

set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform BMG test 
BMG(X1, X2)

DataSimilarity documentation built on June 16, 2025, 5:08 p.m.

DataSimilarity index

Package overview Details on methods and implementations Getting Started with DataSimilarity

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DataSimilarity
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BMG: Biswas et al. (2014) Two-sample Runs Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

Biswas et al. (2014) Two-sample Runs Test

Description

Usage

Arguments

Details

Value

Applicability

References

See Also

Examples

Related to BMG in DataSimilarity...

R Package Documentation

Browse R Packages

We want your feedback!

DataSimilarity Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BMG: Biswas et al. (2014) Two-sample Runs Test In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

Biswas et al. (2014) Two-sample Runs Test

Description

Usage

Arguments

Details

Value

Applicability

References

See Also

Examples

Related to BMG in DataSimilarity...

R Package Documentation

Browse R Packages

We want your feedback!

DataSimilarity
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

BMG: Biswas et al. (2014) Two-sample Runs Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing