View source: R/findSimilarityMethod.R
findSimilarityMethod | R Documentation |
Find a dataset similarity method for the dataset comparison at hand and display information on suitable methods.
findSimilarityMethod(Numeric = FALSE, Categorical = FALSE,
Target.Inclusion = FALSE, Multiple.Samples = FALSE,
only.names = TRUE, ...)
Numeric |
Is it required that the method is applicable to numeric data? (default: |
Categorical |
Is it required that the method is applicable to categorical data? (default: |
Target.Inclusion |
Is it required that the method is applicable to datasets that include a target variable? (default: |
Multiple.Samples |
Is it required that the method is applicable to multiple datasets simultaneously? (default: |
only.names |
Should only the function names be returned? (default: |
... |
Further criteria that the method should fulfill, see |
This function is intended to facilitate finding suitable methods. The criteria that a method should fulfill for the application at hand can be specified and a vector of the function names or the full information on the methods is returned.
Either a character vector of function names for only.names = TRUE
or a subset of method.table
of the selected methods for only.names = FALSE
.
Article describing the criteria and taxonomy: Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
Full interactive results table: https://shiny.statistik.tu-dortmund.de/data-similarity/
method.table
, DataSimilarity
# Workflow for using the DataSimilarity package:
# Prepare data example: comparing species in iris dataset
data("iris")
iris.split <- split(iris[, -5], iris$Species)
setosa <- iris.split$setosa
versicolor <- iris.split$versicolor
virginica <- iris.split$virginica
# 1. Find appropriate methods that can be used to compare 3 numeric datasets:
findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE)
# get more information
findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE, only.names = FALSE)
# 2. Choose a method and apply it:
# All suitable methods
possible.methds <- findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE,
only.names = FALSE)
# Select, e.g., method with highest number of fulfilled criteria
possible.methds$Implementation[which.max(possible.methds$Number.Fulfilled)]
set.seed(1234)
if(requireNamespace("KMD")) {
DataSimilarity(setosa, versicolor, virginica, method = "KMD")
}
# or directly
set.seed(1234)
if(requireNamespace("KMD")) {
KMD(setosa, versicolor, virginica)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.