README.md

Clustering Datasets

An R-repackaging of datasets useful for evaluating clustering methods. The source for most is http://cs.joensuu.fi/sipu/datasets

I would love to include additional clustering datasets, if folks would like to provide them or make a PR.

Clustering Datasets

This vignette provides a simple overview of the datasets included in the package.

Birch

S Sets

The S-sets are useful for testing how an algorithm handles cluster overlap.

A Sets

Shapesets

Chameleon

Neural Gas

Non-Convex

Locations

High Dimensional Datasets

The package contains three sets of high-dimensional data. The visualizations below were made using my largeVis package to reduce each dataset to two dimensions, and the colors are the result of applying the hdbscan function within the package.

UCI Datasets

KDDCUP04Bio

Sklearn Toy Datasets

The Python sklearn.datasets package includes functions for creating toy datasets. I’ve ported a few of them.

Make Blobs

library(clusteringdatasets)
blobs <- make_blobs()
plot(blobs$samples, col=rainbow(3)[blobs$labels])

Make Moons

moons <- make_moons(noise=0.04)
plot(moons$samples, col=rainbow(2)[moons$labels])



elbamos/clusteringdatasets documentation built on May 16, 2019, 2:58 a.m.