| SH | R Documentation |
Performs the Schilling-Henze two-sample test for multivariate data (Schilling, 1986; Henze, 1988).
SH(X1, X2, K = 5, graph.fun = knn.bf, dist.fun = stats::dist, n.perm = 0,
dist.args = NULL, seed = NULL)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
K |
Number of nearest neighbors to consider (default: 5) |
graph.fun |
Function for calculating a similarity graph using the distance matrix on the pooled sample (default: |
dist.fun |
Function for calculating a distance matrix on the pooled dataset (default: |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
dist.args |
Named list of further arguments passed to |
seed |
Random seed (default: NULL). A random seed will only be set if one is provided. |
The test statistic is the proportion of edges connecting points from the same dataset in a K-nearest neighbor graph calculated on the pooled sample (standardized with expectation and SD under the null).
Low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values.
For n.perm = 0, an asymptotic test using the asymptotic normal approximation of the conditional null distribution is performed. For n.perm > 0, a permutation test is performed.
An object of class htest with the following components:
statistic |
Observed value of the test statistic |
p.value |
Asymptotic or permutation p value |
estimate |
The number of within-sample edges |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
| Target variable? | Numeric? | Categorical? | K-sample? |
| No | Yes | No | No |
The default for K is chosen based on simulation results of Stolte et al. (2026). Note that still there is little guidance on the choice of K.
Typical values for K chosen in the literature are 1 and 5.
Because this method cannot handle missing data, any missing values are removed automatically and a warning is issued.
Schilling, M. F. (1986). Multivariate Two-Sample Tests Based on Nearest Neighbors. Journal of the American Statistical Association, 81(395), 799-806. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2307/2289012")}
Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772-783.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
Stolte, M., Rahnenführer, J., Bommert, A. (2026). An Empirical Comparison of Methods for Quantifying the Similarity of Numeric Datasets. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2604.12327")}
knn, BQS, FR, CF, CCS, ZC for other graph-based tests,
FR_cat, CF_cat, CCS_cat, and ZC_cat for versions of the test for categorical data
set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Schilling-Henze test
SH(X1, X2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.