Rosenbaum: Rosenbaum Crossmatch Test

View source: R/Rosenbaum.R

RosenbaumR Documentation

Rosenbaum Crossmatch Test

Description

Performs the Rosenbaum (2005) crossmatch two-sample test. The implementation here uses the crossmatchtest implementation from the crossmatch package.

Usage

Rosenbaum(X1, X2, exact = FALSE, dist.fun = stats::dist, dist.args = NULL, seed = 42)

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

exact

Should the exact null distribution be used? (default: FALSE). The exact distribution calculation is only possible for a pooled sample size of less than 340 due to numerical reasons. If exact = FALSE or the sample size limit is reached, an asymptotic test is performed.

dist.fun

Function for calculating a distance matrix on the pooled dataset (default: stats::dist, Euclidean distance).

dist.args

Named list of further arguments passed to dist.fun (default: NULL).

seed

Random seed (default: 42)

Details

The test statistic is calculated as the standardized number of edges connecting points from different samples in a non-bipartite matching. The non-bipartite matching is calculated using the implementation from the nbpMatching package. The null hypothesis of equal distributions is rejected for small values of the test statistic as high values of the crossmatch statistic indicate similarity between datasets.

This implementation is a wrapper function around the function crossmatchtest that modifies the in- and output of that function to match the other functions provided in this package. For more details see crossmatchtest.

Value

An object of class htest with the following components:

statistic

Observed value of the test statistic

p.value

Asymptotic p value

estimate

Unstandardized crossmatch count

alternative

The alternative hypothesis

method

Description of the test

data.name

The dataset names

stderr

Standard deviation of the test statistic under the null

mu0

Expectation of the test statistic under the null

Applicability

Target variable? Numeric? Categorical? K-sample?
No Yes No No

References

Rosenbaum, P.R. (2005), An exact distribution-free test comparing two multivariate distributions based on adjacency, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 4, 515-530.

Heller, R., Small, D., Rosenbaum, P. (2024). crossmatch: The Cross-match Test. R package version 1.4, https://CRAN.R-project.org/package=crossmatch

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

See Also

FR, CF, CCS, ZC

Petrie, MMCM for multi-sample versions of the test

Examples

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform crossmatch test
if(requireNamespace("crossmatch", quietly = TRUE)) {
  Rosenbaum(X1, X2)
}

DataSimilarity documentation built on April 3, 2025, 9:39 p.m.