ZC | R Documentation |
Performs the maxtype edge-count two-sample test for multivariate data proposed by Zhang and Chen (2017). The implementation here uses the g.tests
implementation from the gTests package.
ZC(X1, X2, dist.fun = stats::dist, graph.fun = MST, n.perm = 0,
dist.args = NULL, graph.args = NULL, maxtype.kappa = 1.14, seed = 42)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
dist.fun |
Function for calculating a distance matrix on the pooled dataset (default: |
graph.fun |
Function for calculating a similarity graph using the distance matrix on the pooled sample (default: |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
dist.args |
Named list of further arguments passed to |
graph.args |
Named list of further arguments passed to |
maxtype.kappa |
Parameter |
seed |
Random seed (default: 42) |
The test is an enhancement of the Friedman-Rafsky test (original edge-count test) that aims at detecting both location and scale alternatives and is more flexible than the generalized edge-count test of Chen and Friedman (2017). The test statistic is the maximum of two statistics. The first statistic ist the weighted edge-count statistic multiplied by a factor \kappa
. The second statistic is the absolute value of the standardized difference of edge-counts within the first and within the second sample.
Low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values.
For n.perm = 0
, an asymptotic test using the asymptotic normal approximation of the null distribution is performed. For n.perm > 0
, a permutation test is performed.
This implementation is a wrapper function around the function g.tests
that modifies the in- and output of that function to match the other functions provided in this package. For more details see the g.tests
.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
p.value |
Asymptotic or permutation p value |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
Zhang, J. and Chen, H. (2022). Graph-Based Two-Sample Tests for Data with Repeated Observations. Statistica Sinica 32, 391-415, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.5705/ss.202019.0116")}.
Chen, H., and Zhang, J. (2017). gTests: Graph-Based Two-Sample Tests. R package version 0.2, https://CRAN.R-project.org/package=gTests.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
FR
for the original edge-count test, CF
for the generalized edge-count test, CCS
for the weighted edge-count test, gTests
for performing all these edge-count tests at once, SH
for performing the Schilling-Henze nearest neighbor test,
CCS_cat
, FR_cat
, CF_cat
, ZC_cat
, and gTests_cat
for versions of the test for categorical data
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform maxtype edge-count test
if(requireNamespace("gTests", quietly = TRUE)) {
ZC(X1, X2)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.