aggregation: Synthetic dataset of two-dimensional points.

Description Usage Format Source References


This is a synthetic dataset that contains features that are known to create difficulties for the selected algorithms such as, narrow bridges between clusters, uneven-sized clusters, etc. See references, for details.




A data frame containing 788 observations and two dimensions, forming seven partitions:

  1. x1: synthetically generated real positive values

  2. x2: synthetically generated real positive values

Originally, the dataset had contained three dimensions. We intentionally removed the third dimension that corresponds to the label which the data point belongs. All description about the data set may be found in Clustering Aggregation article, in the references.


The dataset was collected from Clustering basic benchmark site.


A. Gionis, H. Mannila, and P. Tsaparas, Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 2007. 1(1): p. 1-30.

P. Franti and S. Sieranoja, K-means properties on six clustering benchmark datasets, vol. 48, no. 12. pp. 4743-4759, 2018.

jairsonrodrigues/gama documentation built on May 17, 2019, 3:12 a.m.