DSC_evoStream: evoStream - Evolutionary Stream Clustering

View source: R/DSC_evoStream.R

DSC_evoStreamR Documentation

evoStream - Evolutionary Stream Clustering

Description

Micro Clusterer with reclustering. Stream clustering algorithm based on evolutionary optimization.

Usage

DSC_evoStream(
  formula = NULL,
  r,
  lambda = 0.001,
  tgap = 100,
  k = 2,
  crossoverRate = 0.8,
  mutationRate = 0.001,
  populationSize = 100,
  initializeAfter = 2 * k,
  incrementalGenerations = 1,
  reclusterGenerations = 1000
)

Arguments

formula

NULL to use all features in the stream or a model formula of the form ~ X1 + X2 to specify the features used for clustering. Only ., + and - are currently supported in the formula.

r

radius threshold for micro-cluster assignment

lambda

decay rate

tgap

time-interval between outlier detection and clean-up

k

number of macro-clusters

crossoverRate

cross-over rate for the evolutionary algorithm

mutationRate

mutation rate for the evolutionary algorithm

populationSize

number of solutions that the evolutionary algorithm maintains

initializeAfter

number of micro-cluster required for the initialization of the evolutionary algorithm.

incrementalGenerations

number of EA generations performed after each observation

reclusterGenerations

number of EA generations performed during reclustering

Details

The online component uses a simplified version of DBSTREAM to generate micro-clusters. The micro-clusters are then incrementally reclustered using an evolutionary algorithm. Evolutionary algorithms create slight variations by combining and randomly modifying existing solutions. By iteratively selecting better solutions, an evolutionary pressure is created which improves the clustering over time. Since the evolutionary algorithm is incremental, it is possible to apply it between observations, e.g. in the idle time of the stream. Whenever there is idle time, we can call the recluster() function of the reference class to improve the macro-clusters (see example). The evolutionary algorithm can also be applied as a traditional reclustering step, or a combination of both. In addition, this implementation also allows to evaluate a fixed number of generations after each observation.

Author(s)

Matthias Carnein Matthias.Carnein@uni-muenster.de

References

Carnein M. and Trautmann H. (2018), "evoStream - Evolutionary Stream Clustering Utilizing Idle Times", Big Data Research.

See Also

Other DSC_Micro: DSC_BICO(), DSC_BIRCH(), DSC_DBSTREAM(), DSC_DStream(), DSC_Micro(), DSC_Sample(), DSC_Window()

Other DSC_TwoStage: DSC_DBSTREAM(), DSC_DStream(), DSC_TwoStage()

Examples

stream <- DSD_Gaussians(k = 3, d = 2) %>% DSD_Memory(n = 500)

## init evoStream
evoStream <- DSC_evoStream(r = 0.05, k = 3,
  incrementalGenerations = 1, reclusterGenerations = 500)

## insert observations
update(evoStream, stream, n = 500)

## micro clusters
get_centers(evoStream, type = "micro")

## micro weights
get_weights(evoStream, type = "micro")

## macro clusters
get_centers(evoStream, type = "macro")

## macro weights
get_weights(evoStream, type = "macro")

## plot result
reset_stream(stream)
plot(evoStream, stream)

## if we have time, then we can evaluate additional generations.
## This can be called at any time, also between observations.
## by default, 1 generation is evaluated after each observation and
## 1000 generations during reclustering but we set it here to 500
evoStream$RObj$recluster(500)

## plot improved result
reset_stream(stream)
plot(evoStream, stream)

## get assignment of micro to macro clusters
microToMacro(evoStream)

mhahsler/stream documentation built on April 24, 2024, 10:10 p.m.