FR_cat | R Documentation |
Performs the Friedman-Rafsky two-sample test (original edge-count test) for multivariate data (Friedman and Rafsky, 1979). The implementation here uses the g.tests
implementation from the gTests package.
FR_cat(X1, X2, dist.fun, agg.type, graph.type = "mstree", K = 1, n.perm = 0,
seed = 42)
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
dist.fun |
Function for calculating a distance matrix on the pooled dataset. |
agg.type |
Character giving the method for aggregating over possible similarity graphs. Options are |
graph.type |
Character specifying which similarity graph to use. Possible options are |
K |
Parameter for graph (default: 1). If |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
seed |
Random seed (default: 42) |
The test is a multivariate extension of the univariate Wald Wolfowitz runs test. The test statistic is the number of edges connecting points from different datasets in a minimum spanning tree calculated on the pooled sample (standardized with expectation and SD under the null). For discrete data, the similarity graph used in the test is not necessarily unique. This can be solved by either taking a union of all optimal similarity graphs or averaging the test statistics over all optimal similarity graphs. For details, see Zhang and Chen (2022).
High values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for small values.
For n.perm = 0
, an asymptotic test using the asymptotic normal approximation of the null distribution is performed. For n.perm > 0
, a permutation test is performed.
This implementation is a wrapper function around the function g.tests
that modifies the in- and output of that function to match the other functions provided in this package. For more details see the g.tests
.
An object of class htest
with the following components:
statistic |
Observed value of the test statistic |
parameter |
Degrees of freedom for |
p.value |
Asymptotic or permutation p value |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
Target variable? | Numeric? | Categorical? | K-sample? |
No | No | Yes | No |
Friedman, J. H., and Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests. The Annals of Statistics, 7(4), 697-717.
Zhang, J. and Chen, H. (2022). Graph-Based Two-Sample Tests for Data with Repeated Observations. Statistica Sinica 32, 391-415, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.5705/ss.202019.0116")}.
Chen, H., and Zhang, J. (2017). gTests: Graph-Based Two-Sample Tests. R package version 0.2, https://CRAN.R-project.org/package=gTests.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
CF_cat
for the generalized edge-count test, CCS_cat
for the weighted edge-count test, ZC_cat
for the maxtype edge-count test, gTests_cat
for performing all these edge-count tests at once,
CCS
, FR
, CF
, ZC
, and gTests
for versions of the tests for continuous data, and SH
for performing the Schilling-Henze nearest neighbor test
# Draw some data
X1cat <- matrix(sample(1:4, 300, replace = TRUE), ncol = 3)
X2cat <- matrix(sample(1:4, 300, replace = TRUE, prob = 1:4), ncol = 3)
# Perform Friedman-Rafsky test
if(requireNamespace("gTests", quietly = TRUE)) {
FR_cat(X1cat, X2cat, dist.fun = function(x, y) sum(x != y), agg.type = "a")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.