BQS: Barakat et al. (1996) Two-Sample Test

View source: R/BQS.R

BQSR Documentation

Barakat et al. (1996) Two-Sample Test

Description

Performs the nearest-neighbor-based multivariate two-sample test of Barakat et al. (1996).

Usage

BQS(X1, X2, dist.fun = stats::dist, n.perm = 0, dist.args = NULL, seed = 42)

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

dist.fun

Function for calculating a distance matrix on the pooled dataset (default: stats::dist, Euclidean distance).

n.perm

Number of permutations for permutation test (default: 0, no test is performed).

dist.args

Named list of further arguments passed to dist.fun (default: NULL).

seed

Random seed (default: 42)

Details

The test is an extension of the Schilling (1986) and Henze (1988) neighbor test that bypasses choosing the number of nearest neighbors to consider. The Schilling-Henze test statistic is the proportion of edges connecting points from the same dataset in a K-nearest neighbor graph calculated on the pooled sample (standardized with expectation and SD under the null). Barakat et al. (1996) take the weighted sum of the Schilling-Henze test statistics for K = 1,\dots,N-1, where N denotes the pooled sample size.

As for the Schilling-Henze test, low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values. A permutation test is performed if n.perm is set to a positive number.

Value

An object of class htest with the following components:

statistic

Observed value of the test statistic

p.value

Permutation p value (if n.perm > 0)

alternative

The alternative hypothesis

method

Description of the test

data.name

The dataset names

Applicability

Target variable? Numeric? Categorical? K-sample?
No Yes No No

References

Barakat, A.S., Quade, D. and Salama, I.A. (1996), Multivariate Homogeneity Testing Using an Extended Concept of Nearest Neighbors. Biom. J., 38: 605-612. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/bimj.4710380509")}

Schilling, M. F. (1986). Multivariate Two-Sample Tests Based on Nearest Neighbors. Journal of the American Statistical Association, 81(395), 799-806. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2307/2289012")}

Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772-783.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

See Also

SH, FR, CF, CCS, ZC for other graph-based tests, FR_cat, CF_cat, CCS_cat, and ZC_cat for versions of the test for categorical data

Examples

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Barakat et al. test
BQS(X1, X2)

DataSimilarity documentation built on April 3, 2025, 9:39 p.m.