LHZ: Li et al. (2022) empirical characteristic distance

View source: R/LHZ.R

LHZR Documentation

Li et al. (2022) empirical characteristic distance

Description

The function implements the Li et al. (2022) empirical characteristic distance between two datasets.

Usage

LHZ(X1, X2, n.perm = 0, seed = 42)

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

n.perm

Number of permutations for permutation test (default: 0, no permutation test performed)

seed

Random seed (default: 42)

Details

The test statistic

T_{n, m} = \frac{1}{n^2} \sum_{j, q = 1}^n \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, X_j-X_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, X_j-X_q\rangle} \right\Vert^2 \right) + \frac{1}{m^2} \sum_{j, q = 1}^m \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, Y_j-Y_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, Y_j-Y_q\rangle} \right\Vert^2 \right)

is calculated according to Li et al. (2022). The datasets are denoted by X and Y with respective sample sizes n and m. By X_j the i-th row of dataset X is denoted. Furthermore, \Vert \cdot \Vert indicates the Euclidian norm and \langle X_i, X_j \rangle indicates the inner product between X_i and X_j.

Low values of the test statistic indicate similarity. Therefore, the permutation test rejects for large values of the test statistic.

Value

An object of class htest with the following components:

method

Description of the test

statistic

Observed value of the test statistic

p.value

Permutation p value (only if n.perm > 0)

data.name

The dataset names

alternative

The alternative hypothesis

Applicability

Target variable? Numeric? Categorical? K-sample?
No Yes No No

References

Li, X., Hu, W. and Zhang, B. (2022). Measuring and testing homogeneity of distributions by characteristic distance, Statistical Papers 64 (2), 529-556, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00362-022-01327-7")}

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

See Also

LHZStatistic

Examples

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate LHZ statistic
LHZ(X1, X2)

DataSimilarity documentation built on April 3, 2025, 9:39 p.m.