DSC_StreamKM: streamKM++

View source: R/DSC_streamkm.R

DSC_StreamKMR Documentation

streamKM++

Description

This is an interface to the MOA implementation of streamKM++.

Usage

DSC_StreamKM(sizeCoreset = 10000, numClusters = 5, length = 100000L, ...)

Arguments

sizeCoreset

Size of the coreset

numClusters

Number of clusters to compute

length

Length of the data stream

...

Further arguments are passed on to DSC_Kmeans for reclustering.

Details

streamKM++ uses a tree-based sampling strategy to obtain a small weighted sample of the stream called coreset. The MOA implementation applies the k-means++ algorithm to find a given number of centers in the coreset.

Notes:

  • The clustere can only cluster the number of points specified in length ans then produces an ArrayIndexOutOfBoundsException error.

  • The coreset (micro-clusters are not accessible), only the macro-clusters can be requested.

Author(s)

Matthias Carnein

References

Marcel R. Ackermann, Christiane Lammersen, Marcus Maertens, Christoph Raupach, Christian Sohler, Kamil Swierkot. StreamKM++: A Clustering Algorithm for Data Streams. In: Proceedings of the 12th Workshop on Algorithm Engineering and Experiments (ALENEX '10), 2010.

See Also

Other DSC_MOA: DSC_BICO_MOA(), DSC_CluStream(), DSC_ClusTree(), DSC_DStream_MOA(), DSC_DenStream(), DSC_MCOD(), DSC_MOA()

Examples

set.seed(1000)
stream <- DSD_Gaussians(k = 3, d = 2, noise = 0.05)

# cluster with streamKM++
streamkm <- DSC_StreamKM(sizeCoreset = 100, numClusters = 3, length = 1000)
update(streamkm, stream, 100)
streamkm

# plot macro-clusters (no access to micro-clusters)
plot(streamkm, stream)

streamMOA documentation built on Sept. 4, 2022, 1:05 a.m.