DSC_CluStream: CluStream Data Stream Clusterer

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Class implements the CluStream cluster algorithm for data streams.

Usage

1
2
DSC_CluStream(m = 100, horizon = 1000, t = 2, k = NULL)
DSC_CluStream_MOA(m = 100, horizon = 1000, t = 2, k = NULL)

Arguments

m

Defines the maximum number of micro-clusters used in CluStream

horizon

Defines the time window to be used in CluStream

t

Maximal boundary factor (=Kernel radius factor). When deciding to add a new data point to a micro-cluster, the maximum boundary is defined as a factor of t of the RMS deviation of the data points in the micro-cluster from the centroid.

k

Number of macro-clusters to produce using weighted k-means. NULL disables automatic reclustering.

Details

This is an interface to the MOA implementation of CluStream.

If k is specified, then CluStream applies a weighted k-means algorithm for reclustering (see Examples section below).

Value

An object of class DSC_CluStream (subclass of DSC_Micro, DSC_MOA and DSC), or, if k is not NULL then an object of DSC_TwoStage.

Author(s)

Michael Hahsler and John Forrest

References

Aggarwal CC, Han J, Wang J, Yu PS (2003). "A Framework for Clustering Evolving Data Streams." In "Proceedings of the International Conference on Very Large Data Bases (VLDB '03)," pp. 81-92.

Bifet A, Holmes G, Pfahringer B, Kranen P, Kremer H, Jansen T, Seidl T (2010). MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering. In Journal of Machine Learning Research (JMLR).

See Also

DSC, DSC_Micro, DSC_MOA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# data with 3 clusters and 5% noise
stream <- DSD_Gaussians(k=3, d=2, noise=.05)

# cluster with CluStream
clustream <- DSC_CluStream(m=50)
update(clustream, stream, 500)
clustream

# plot micro-clusters
plot(clustream, stream)

# plot assignment area (micro-cluster radius)
plot(clustream, stream, assignment=TRUE, weights=FALSE)

# reclustering. Use weighted k-means for CluStream
kmeans <- DSC_Kmeans(k=3, weighted=TRUE)
recluster(kmeans, clustream)
plot(kmeans, stream, type="both")

# use k-means reclustering automatically by specifying k
clustream <- DSC_CluStream(m=50, k=3)
update(clustream, stream, 500)
clustream

plot(clustream, stream, type="both")

Example output

Loading required package: stream
Loading required package: proxy

Attaching package: 'proxy'

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

Loading required package: rJava
CluStream
Class: DSC_CluStream, DSC_Micro, DSC_MOA, DSC 
Number of micro-clusters: 50 
CluStream + k-Means (weighted)
Class: DSC_TwoStage, DSC_Macro, DSC 
Number of micro-clusters: 50 
Number of macro-clusters: 3 

streamMOA documentation built on May 16, 2019, 1:07 a.m.