The sortinghat
package is a framework in R to streamline the evaluation of
classifiers (classification models and algorithms) and seeks to determine the
best classifiers on a variety of simulated and benchmark data sets with a
collection of benchmark metrics.
You can install the stable version on CRAN:
install.packages('sortinghat', dependencies = TRUE)
If you prefer to download the latest version, instead type:
library(devtools)
install_github('sortinghat', 'ramey')
A primary goal of sortinghat
is to enable rapid benchmarking across a variety of
classification scenarios. To achieve this, we provide a large selection of both
real and simulated data sets collected from the literature and around the
Internet. With sortinghat
, researchers can quickly replicate findings within the
literature as well as rapidly prototype new classifiers.
The list of real and simulated data sets will continue to grow. Contributions are greatly appreciated as pull requests.
Benchmark data sets are useful for evaluating and comparing classifiers...
(Work in Progress: Version 0.2 will include a collection of benchmark data sets)
In addition to benchmark data sets, sortinghat
provide a large collection of
data-generating models for simulations based on studies in the literature. Thus
far, we have added multivariate simulation models based on the following family
of distributions:
Moreover, data can be generated based on the well-known configurations from:
The simulated data sets listed above can be generated via the simdata
function.
Classifier superiority is often determined by classification error rate (1 - accuracy). To assess classification efficacy, we utilize the following error-rate estimators:
Each of these error rates can be accessed via the errorest
function, which
acts as a wrapper around the error-rate estimators listed above.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.