LHZ | R Documentation |
The function implements the Li et al. (2022) empirical characteristic distance between two datasets.
LHZ(X1, X2, n.perm = 0, seed = 42)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
n.perm |
Number of permutations for permutation test (default: 0, no permutation test performed) |
seed |
Random seed (default: 42) |
The test statistic
T_{n, m} = \frac{1}{n^2} \sum_{j, q = 1}^n \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, X_j-X_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, X_j-X_q\rangle} \right\Vert^2 \right) + \frac{1}{m^2} \sum_{j, q = 1}^m \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, Y_j-Y_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, Y_j-Y_q\rangle} \right\Vert^2 \right)
is calculated according to Li et al. (2022). The datasets are denoted by X
and Y
with respective sample sizes n
and m
. By X_j
the i
-th row of dataset X
is denoted. Furthermore, \Vert \cdot \Vert
indicates the Euclidian norm and \langle X_i, X_j \rangle
indicates the inner product between X_i
and X_j
.
Low values of the test statistic indicate similarity. Therefore, the permutation test rejects for large values of the test statistic.
An object of class htest
with the following components:
method |
Description of the test |
statistic |
Observed value of the test statistic |
p.value |
Permutation p value (only if |
data.name |
The dataset names |
alternative |
The alternative hypothesis |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
Li, X., Hu, W. and Zhang, B. (2022). Measuring and testing homogeneity of distributions by characteristic distance, Statistical Papers 64 (2), 529-556, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00362-022-01327-7")}
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
LHZStatistic
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate LHZ statistic
LHZ(X1, X2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.