method.table: List of Methods Included in the Package
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

method.table

R Documentation

List of Methods Included in the Package

Description

The dataset contains the subset of methods that are implemented in the DataSimilarity package from the results table of Stolte et al. (2024).

Usage

data("method.table")

Format

A data frame with 42 observations on the following 30 variables that include information on whether or not the method fulfills the theoretical criteria of Stolte et al. (2024). Some criteria are only fulfilled for certain parameter choices of the method ("Conditionally Fulfilled") or do not apply to the method. NA values mean that there is no information available on whether or not the respective criterion is fulfilled.

Method: a character vector giving the reference or method name
Implementation: a character vector giving the function name of the implementation in the DataSimilarity package
Target.Inclusion: a character vector. Can the method handle datasets that include a target variable in a meaningful way?
Numeric: a character vector. Can the method handle numeric data?
Categorical: a character vector. Can the method handle categorical data?
Unequal.Sample.Sizes: a character vector. Can the method handle datasets of different sample sizes?
p.Larger.N: a character vector. Can the method handle datasets with more variables than observations?
Multiple.Samples: a character vector. Can the method handle k > 2 datasets simultaneously?
Without.training: a character vector. Does the method work without holding out training data?
No.assumptions: a character vector. Does the method work without further assumptions?
No.parameters: a character vector. Does the method work without the specification or tuning of additional parameters?
Implemented: a character vector. Is the method implemented elsewhere? (NA if no other implementations are known)
Complexity: a character vector giving the computational complexity of the method.
Interpretable.units: a character vector. Can a one unit increase of the output value be interpreted?
Lower.bound: a character vector. Are the output values lower bounded? If known the lower bound is given.
Upper.bound: a character vector. Are the output values upper bounded? If known the upper bound is given.
Rotation.invariant: a character vector. Is the method invariant to rotation of all datasets?
Location.change.invariant: a character vector. Is the method invariant to shifting all datasets?
Homogeneous.scale.invariant: a character vector. Is the method invariant to scaling all datasets?
Positive.definite: a character vector. Is the method positive definite, i.e. d(F_1, F_2) \ge 0 and d(F_1, F_2) = 0 \Leftrightarrow F_1 = F_2 for any two distributions F_1, F_2?
Symmetric: a character vector. Ist the method symmetric, i.e. d(F_1, F_2) = d(F_2, F_1) for any two distributions F_1, F_2?
Triangle.inequality: a character vector. Does the method fulfill the triangle inequality, i.e. d(F_1, F_2) \le d(F_1, F_3) + d(F_3, F_2) for any three distributions F_1, F_2, F_3?
Consistency.N: a character vector. Is the corresponding test consistent for N\to\infty?
Consistency.p: a character vector. Is the corresponding test consistent for p\to\infty?
Number.Fulfilled: a numeric vector. Number of fulfilled criteria.
Number.Cond.Fulfilled: a numeric vector. Number of conditionally fulfilled criteria.
Number.Unfulfilled: a numeric vector. Number of unfulfilled criteria.
Number.NA: a numeric vector. Number of criteria for which it is unknown if they are fulfilled.
Class: a character vector. Class of the taxonomy of Stolte et al. (2024) that the method is assigned to based on its underlying idea.
Subclass: a character vector. Subclass of the taxonomy of Stolte et al. (2024) that the method is assigned to based on its underlying idea.

Details

The dataset is based on the results of Stolte et al. (2024). For explanations on the criteria and on the taxonomy and classes, please refer to that publication. A full version of the table can also be found at https://shiny.statistik.tu-dortmund.de/data-similarity/.

Source

Article describing the criteria and taxonomy: Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}

Full interactive results table: https://shiny.statistik.tu-dortmund.de/data-similarity/

Examples

data("method.table")

# Workflow for using the DataSimilarity package: 
# Prepare data example: comparing species in iris dataset
data("iris")
iris.split <- split(iris[, -5], iris$Species)
setosa <- iris.split$setosa
versicolor <- iris.split$versicolor
virginica <- iris.split$virginica

# 1. Find appropriate methods that can be used to compare 3 numeric datasets:
findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE)

# get more information 
findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE, only.names = FALSE)

# 2. Choose a method and apply it:
# All suitable methods
possible.methds <- findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE, 
                                          only.names = FALSE)
# Select, e.g., method with highest number of fulfilled criteria
possible.methds$Implementation[which.max(possible.methds$Number.Fulfilled)]

set.seed(1234)
if(requireNamespace("KMD")) {
  DataSimilarity(setosa, versicolor, virginica, method = "KMD")
}

# or directly 
set.seed(1234)
if(requireNamespace("KMD")) {
  KMD(setosa, versicolor, virginica)
}

DataSimilarity documentation built on June 16, 2025, 5:08 p.m.

DataSimilarity index

Package overview Details on methods and implementations Getting Started with DataSimilarity

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DataSimilarity
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

method.table: List of Methods Included in the Package
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

List of Methods Included in the Package

Description

Usage

Format

Details

Source

Examples

Related to method.table in DataSimilarity...

R Package Documentation

Browse R Packages

We want your feedback!

DataSimilarity Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

method.table: List of Methods Included in the Package In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

List of Methods Included in the Package

Description

Usage

Format

Details

Source

Examples

Related to method.table in DataSimilarity...

R Package Documentation

Browse R Packages

We want your feedback!

DataSimilarity
Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

method.table: List of Methods Included in the Package
In DataSimilarity: Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing