DSC_HDDStream: Density-based Projected Clustering over High-Dimensional Data

Description Usage Arguments Details Examples

View source: R/DSC_HDDStream.R

Description

This function creates a DSC object that represents an instance of the HDDStream algorithm and can be used for stream clustering.

Usage

1
2
3
DSC_HDDStream(epsilonN = 0.1, beta = 0.5, mu = 10, lambda = 0.5,
  initPoints = 2000, pi = 30, kappa = 10, delta = 0.001, offline = 2,
  speed = 100)

Arguments

epsilonN

radius of each neighborhood

beta

control the effect of mu

mu

minimum number of points desired to be in a microcluster

lambda

decaying parameter

initPoints

number of points to use for initialization

pi

number of maximal subspace dimensionality

kappa

parameter to define preference weighted vector

delta

defines the threshold for the variance

offline

offline multiplier for epsilon

speed

number of incoming points per time unit

Details

HDDStream is an algorithm for the density-based projected clustering of high-dimensional data streams.

The algorithm is initialized by buffering the first initPoints points that arrive and then applying the PreDeCon algorithm over these points.

Then, Microclusters are maintained online by adding each new point to its closest core Microcluster iff doing so does not increase the projected radius of this microcluster beyond epsilonN. If a point can not be added to a core microcluster, an attempt will be made to add it to an outlier microcluster, with the same criterion as for core microclusters. If these attempts both fail, the point will start its own microcluster. Microclusters are aged according to the decaying parameter lambda.

Macroclustering is performed on-demand, using the PreDeCon algorithm.

Examples

1
2
3

subspaceMOA documentation built on May 29, 2017, 10:50 p.m.