View source: R/findSimilarityMethod.R
| findSimilarityMethod | R Documentation | 
Find a dataset similarity method for the dataset comparison at hand and display information on suitable methods.
findSimilarityMethod(Numeric = FALSE, Categorical = FALSE, 
                      Target.Inclusion = FALSE, Multiple.Samples = FALSE, 
                      only.names = TRUE, ...)
| Numeric | Is it required that the method is applicable to numeric data? (default:  | 
| Categorical | Is it required that the method is applicable to categorical data? (default:  | 
| Target.Inclusion | Is it required that the method is applicable to datasets that include a target variable? (default:  | 
| Multiple.Samples | Is it required that the method is applicable to multiple datasets simultaneously? (default:  | 
| only.names | Should only the function names be returned? (default:  | 
| ... | Further criteria that the method should fulfill, see  | 
This function is intended to facilitate finding suitable methods. The criteria that a method should fulfill for the application at hand can be specified and a vector of the function names or the full information on the methods is returned.
Either a character vector of function names for only.names = TRUE or a subset of method.table of the selected methods for only.names = FALSE.
Article describing the criteria and taxonomy: Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/24-SS149")}
Full interactive results table: https://shiny.statistik.tu-dortmund.de/data-similarity/
method.table, DataSimilarity
# Workflow for using the DataSimilarity package: 
# Prepare data example: comparing species in iris dataset
data("iris")
iris.split <- split(iris[, -5], iris$Species)
setosa <- iris.split$setosa
versicolor <- iris.split$versicolor
virginica <- iris.split$virginica
# 1. Find appropriate methods that can be used to compare 3 numeric datasets:
findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE)
# get more information 
findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE, only.names = FALSE)
# 2. Choose a method and apply it:
# All suitable methods
possible.methds <- findSimilarityMethod(Numeric = TRUE, Multiple.Samples = TRUE, 
                                          only.names = FALSE)
# Select, e.g., method with highest number of fulfilled criteria
possible.methds$Implementation[which.max(possible.methds$Number.Fulfilled)]
set.seed(1234)
if(requireNamespace("KMD")) {
  DataSimilarity(setosa, versicolor, virginica, method = "KMD")
}
# or directly 
set.seed(1234)
if(requireNamespace("KMD")) {
  KMD(setosa, versicolor, virginica)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.