DS: Rank-Based Energy Test (Deb and Sen, 2021)

View source: R/DS.R

DSR Documentation

Rank-Based Energy Test (Deb and Sen, 2021)

Description

Performs the multivariate rank-based multivariate two-sample test using measure transportation by Deb and Sen (2021).

Usage

DS(X1, X2, n.perm = 0, rand.gen = NULL, seed = 42)

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

n.perm

Number of permutations for permuation test (default: 0, no permutation test performed)

rand.gen

Function that generates a grid of (random) numbers in (0,1) of dimension n \times k (n and k are inputs of this function). Default is NULL in which case, randtoolbox::halton is used.

seed

Random seed (default: 42)

Details

The test proposed by Deb and Sen (2021) is a rank-based version of the Energy statistic (Székely and Rizzo, 2004) that does not rely on any moment assumptions. Its test statistic is the Energy statistic applied to the rank map of both samples. The multivariate ranks are computed using optimal transport with a multivariate uniform distribution as the reference distribution.

For the rank version of the Energy statistic it still holds that the value zero is attained if and only if the two distributions coincide. Therefore, low values of the empirical test statistic indicate similarity between the datasets and the null hypothesis of equal distributions is rejected for large values.

Value

An object of class htest with the following components:

statistic

Observed value of the test statistic

p.value

Permutation p value

alternative

The alternative hypothesis

method

Description of the test

data.name

The dataset names

Applicability

Target variable? Numeric? Categorical? K-sample?
No Yes No No

Note

The implementation is a modification of the code supplied by Deb and Sen (2021) for the simulation study presented in the original article. It generalizes the implementation and includes small modifications for computation speed.

Author(s)

Original implementation by Nabarun Deb, Bodhisattva Sen

Minor modifications by Marieke Stolte

References

Original implementation: https://github.com/NabarunD/MultiDistFree

Deb, N. and Sen, B. (2021). Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation, Journal of the American Statistical Association. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2021.1923508")}.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

See Also

Energy

Examples

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Deb and Sen test 
if(requireNamespace("randtoolbox", quietly = TRUE) & 
    requireNamespace("clue", quietly = TRUE)) {
  DS(X1, X2, n.perm = 100)
}

DataSimilarity documentation built on April 3, 2025, 9:39 p.m.