SH | R Documentation |
Performs the Schilling-Henze two-sample test for multivariate data (Schilling, 1986; Henze, 1988).
SH(X1, X2, K = 1, graph.fun = knn.bf, dist.fun = stats::dist, n.perm = 0,
dist.args = NULL, seed = 42)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
K |
Number of nearest neighbors to consider (default: 1) |
graph.fun |
Function for calculating a similarity graph using the distance matrix on the pooled sample (default: |
dist.fun |
Function for calculating a distance matrix on the pooled dataset (default: |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
dist.args |
Named list of further arguments passed to |
seed |
Random seed (default: 42) |
The test statistic is the proportion of edges connecting points from the same dataset in a K
-nearest neighbor graph calculated on the pooled sample (standardized with expectation and SD under the null).
Low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values.
For n.perm = 0
, an asymptotic test using the asymptotic normal approximation of the conditional null distribution is performed. For n.perm > 0
, a permutation test is performed.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
p.value |
Asymptotic or permutation p value |
estimate |
The number of within-sample edges |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
The default of K=1
is chosen rather arbitrary based on computational speed as there is no good rule for chossing K
proposed in the literature so far. Typical values for K
chosen in the literature are 1 and 5.
Schilling, M. F. (1986). Multivariate Two-Sample Tests Based on Nearest Neighbors. Journal of the American Statistical Association, 81(395), 799-806. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2307/2289012")}
Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772-783.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
knn
, BQS
, FR
, CF
, CCS
, ZC
for other graph-based tests,
FR_cat
, CF_cat
, CCS_cat
, and ZC_cat
for versions of the test for categorical data
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Schilling-Henze test
SH(X1, X2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.