FR: Friedman-Rafsky Test
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

View source: R/FR.R

FR	R Documentation

Friedman-Rafsky Test

Description

Performs the Friedman-Rafsky two-sample test (original edge-count test) for multivariate data (Friedman and Rafsky, 1979). The implementation here uses the g.tests implementation from the gTests package.

Usage

FR(X1, X2, dist.fun = stats::dist, graph.fun = MST, n.perm = 0, 
    dist.args = NULL, graph.args = NULL, seed = NULL)

Arguments

`X1`	First dataset as matrix or data.frame
`X2`	Second dataset as matrix or data.frame
`dist.fun`	Function for calculating a distance matrix on the pooled dataset (default: `stats::dist`, Euclidean distance).
`graph.fun`	Function for calculating a similarity graph using the distance matrix on the pooled sample (default: `MST`, Minimum Spanning Tree).
`n.perm`	Number of permutations for permutation test (default: 0, asymptotic test is performed).
`dist.args`	Named list of further arguments passed to `dist.fun` (default: `NULL`).
`graph.args`	Named list of further arguments passed to `graph.fun` (default: `NULL`).
`seed`	Random seed (default: NULL). A random seed will only be set if one is provided.

Details

The test is a multivariate extension of the univariate Wald Wolfowitz runs test. The test statistic is the number of edges connecting points from different datasets in a minimum spanning tree calculated on the pooled sample (standardized with expectation and SD under the null).

High values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for small values.

For n.perm = 0, an asymptotic test using the asymptotic normal approximation of the null distribution is performed. For n.perm > 0, a permutation test is performed.

This implementation is a wrapper function around the function g.tests that modifies the in- and output of that function to match the other functions provided in this package. For more details see the g.tests.

Value

An object of class htest with the following components:

`statistic`	Observed value of the test statistic
`p.value`	Asymptotic or permutation p value
`alternative`	The alternative hypothesis
`method`	Description of the test
`data.name`	The dataset names

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

References

Friedman, J. H., and Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests. The Annals of Statistics, 7(4), 697-717.

Chen, H., and Zhang, J. (2017). gTests: Graph-Based Two-Sample Tests. R package version 0.2, https://CRAN.R-project.org/package=gTests.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

Examples

set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Friedman-Rafsky test
if(requireNamespace("gTests", quietly = TRUE)) {
  # Using MST
  FR(X1, X2)
  # Using 5-MST
  FR(X1, X2, graph.args = list(K = 5))
}

DataSimilarity documentation built on June 16, 2025, 5:08 p.m.