DSC_BICO_MOA: BICO - Fast computation of k-means coresets in a data stream

Description Usage Arguments Details Author(s) References Examples

Description

This is an interface to the MOA implementation of BICO. The original BICO implementation by Fichtenberger et al is also available as DSC_BICO.

Usage

1
2
DSC_BICO_MOA(Cluster = 5, Dimensions, MaxClusterFeatures = 1000,
  Projections = 10, k = NULL, space = NULL, p = NULL)

Arguments

Cluster, k

Number of desired centers

Dimensions

The number of the dimensions of the input points (stream) need to be specified in advance

MaxClusterFeatures, space

Maximum size of the coreset

Projections, p

Number of random projections used for the nearest neighbour search

Details

BICO maintains a tree which is inspired by the clustering tree of BIRCH, a SIGMOD Test of Time award-winning clustering algorithm. Each node in the tree represents a subset of these points. Instead of storing all points as individual objects, only the number of points, the sum and the squared sum of the subset's points are stored as key features of each subset. Points are inserted into exactly one node.

Author(s)

Matthias Carnein

References

Hendrik Fichtenberger, Marc Gille, Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler: BICO: BIRCH Meets Coresets for k-Means Clustering. ESA 2013: 481-492

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# data with 3 clusters and 2 dimensions
stream <- DSD_Gaussians(k=3, d=2)

# cluster with BICO
bico <- DSC_BICO_MOA(Cluster=3, Dimensions=2)
update(bico, stream, 10000)
bico

# plot micro and macro-clusters
plot(bico, stream, type="both")

streamMOA documentation built on May 16, 2019, 1:07 a.m.