engineerMetric: Engineer Metric
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

engineerMetric

R Documentation

Engineer Metric

Description

The function implements the L_q-engineer metric for comparing two multivariate distributions.

Usage

engineerMetric(X1, X2, type = "F", seed = NULL)

Arguments

`X1`	First dataset as matrix or data.frame
`X2`	Second dataset as matrix or data.frame
`type`	Character specifying the type of `L_q`-norm to use. Reasonable options are `"O"`, `"o"`, `"1"`, for the `L_1`-norm, `"I"`, and `"i"`, for the `L_\infty`-norm, and `"F"`, `"f"`, `"E"`, `"e"` (the default) for the `L_2`-norm (Euclidean norm).
`seed`	Random seed (default: NULL). A random seed will only be set if one is provided. Method is deterministic, seed is only set for consistency with other methods.

Details

The engineer is a primary propability metric that is defined as

\text{EN}(X_1, X_2; q) = \left[ \sum_{i = 1}^{p} \left| \text{E}\left(X_{1i}\right) - \text{E}\left(X_{2i}\right)\right|^q\right]^{\min(q, 1/q)} \text{ with } q> 0,

where X_{1i}, X_{2i} denote the ith component of the p-dimensional random vectors X_1\sim F_1 and X_2\sim F_2.

In the implementation, expectations are estimated by column means of the respective datasets.

Value

An object of class htest with the following components:

`method`	Description of the test
`statistic`	Observed value of the test statistic
`data.name`	The dataset names
`method`	Description of the test
`alternative`	The alternative hypothesis

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

Note

The seed argument is only included for consistency with other methods. The result of the metric calculation is deteministic.

References

Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. John Wiley & Sons, Chichester.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

Examples

set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate engineer metric
engineerMetric(X1, X2)

DataSimilarity documentation built on June 16, 2025, 5:08 p.m.