DSC_DBSTREAM: DBSTREAM clustering algorithm

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/DSC_DBSTREAM.R

Description

Implements a simple density-based stream clustering algorithm that assigns data points to micro-clusters with a given radius and implements shared-density-based reclustering.

Usage

1
2
3
4
5
6
DSC_DBSTREAM(r, lambda = 0.001, gaptime = 1000L,
  Cm = 3, metric = "Euclidean", shared_density = FALSE,
  alpha=0.1, k=0, minweight = 0)
get_shared_density(x, use_alpha = TRUE)
change_alpha(x, alpha)
get_cluster_assignments(x)

Arguments

r

The radius of micro-clusters.

lambda

The lambda used in the fading function.

gaptime

weak micro-clusters (and weak shared density entries) are removed every gaptime points.

Cm

minimum weight for a micro-cluster.

metric

metric used to calculate distances.

shared_density

Record shared density information. If set to TRUE then shared density is used for reclustering, otherwise reachability is used (overlapping clusters with less than r*(1-alpha) distance are clustered together).

k

The number of macro clusters to be returned if macro is true.

alpha

For shared density: The minimum proportion of shared points between to clusters to warrant combining them (a suitable value for 2D data is .3). For reachability clustering it is a distance factor.

minweight

The proportion of the total weight a macro-cluster needs to have not to be noise (between 0 and 1).

x

A DSC_DBSTREAM object to get the shared density information from.

use_alpha

only return shared density if it exceeds alpha.

Details

The DBSTREAM algorithm checks for each new data point in the incoming stream, if it is below the threshold value of dissimilarity value of any existing micro-clusters, and if so, merges the point with the micro-cluster. Otherwise, a new micro-cluster is created to accommodate the new data point.

Although DSC_DBSTREAM is a micro clustering algorithm, macro clusters and weights are available.

get_cluster_assignments() can be used to extract the MC assignment for each data point clustered during the last update operation (note: update needs to be called with assignments = TRUE and the block size needs to be large enough). The function returns the MC index (in the current set of MCs obtained with, e.g., get_centers()) and as an attribute the permanent MC ids.

plot() for DSC_DBSTREAM has two extra logical parameters called assignment and shared_density which show the assignment area and the shared density graph, respectively.

Value

An object of class DSC_DBSTREAM (subclass of DSC, DSC_R, DSC_Micro).

Author(s)

Michael Hahsler and Matthew Bolanos

References

Michael Hahsler and Matthew Bolanos. Clustering data streams based on shared density between micro-clusters. IEEE Transactions on Knowledge and Data Engineering, 28(6):1449–1461, June 2016

See Also

DSC, DSC_Micro

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
set.seed(0)
stream <- DSD_Gaussians(k = 3, noise = 0.05)

# create clusterer with r = 0.05
dbstream <- DSC_DBSTREAM(r = .05)
update(dbstream, stream, 1000)
dbstream

# check micro-clusters
nclusters(dbstream)
head(get_centers(dbstream))
plot(dbstream, stream)

# plot macro-clusters
plot(dbstream, stream, type = "both")

# plot micro-clusters with assignment area
plot(dbstream, stream, type = "both", assignment = TRUE)


# DBSTREAM with shared density
dbstream <- DSC_DBSTREAM(r = .05, shared_density = TRUE, Cm=5)
update(dbstream, stream, 1000)
dbstream
plot(dbstream, stream, type = "both")
# plot the shared density graph (several options)
plot(dbstream, stream, type = "both", shared_density = TRUE)
plot(dbstream, stream, type = "micro", shared_density = TRUE)
plot(dbstream, stream, type = "micro", shared_density = TRUE, assignment = TRUE)
plot(dbstream, stream, type = "none", shared_density = TRUE, assignment = TRUE)

# see how micro and macro-clusters relate
# each microcluster has an entry with the macro-cluster id
# Note: unassigned micro-clusters (noise) have an NA
microToMacro(dbstream)

# do some evaluation
evaluate(dbstream, stream, measure="purity")
evaluate(dbstream, stream, measure="cRand", type="macro")

# use DBSTREAM for conventional clustering (with assignments = TRUE so we can
# later retrieve the cluster assignments for each point)
data("iris")
dbstream <- DSC_DBSTREAM(r = 1)
update(dbstream, iris[,-5], assignments = TRUE)
dbstream

cl <- get_cluster_assignments(dbstream)
cl

# micro-clusters
plot(iris[,-5], col = cl, pch = cl)

# macro-clusters
plot(iris[,-5], col = microToMacro(dbstream, cl))

Example output

Loading required package: proxy

Attaching package: 'proxy'

The following objects are masked from 'package:stats':

    as.dist, dist

The following object is masked from 'package:base':

    as.matrix

DBSTREAM
Class: DSC_DBSTREAM, DSC_Micro, DSC_R, DSC 
Number of micro-clusters: 68 
Number of macro-clusters: 2 
[1] 68
         X1        X2
1 0.7655539 0.5366765
2 0.5639767 0.3198824
3 0.8039369 0.6348941
4 0.7174734 0.5560006
5 0.3930578 0.3481002
6 0.6017358 0.4401391
DBSTREAM
Class: DSC_DBSTREAM, DSC_Micro, DSC_R, DSC 
Number of micro-clusters: 50 
Number of macro-clusters: 4 
 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 25 26 28 29 30 32 
 1  2  3  2  1  3  1  2  3  3  1  3  2  1  1  1  3  3  3  1  3  3  2  1  3  2 
33 34 36 37 38 39 40 41 44 46 48 49 51 55 56 58 66 67 68 69 71 74 75 82 
 3  1  2  3  1  1  3  3  2  3  3  3  1  1  3  1  3  3  2  2  4  3  1  1 
Evaluation results for micro-clusters.
Points were assigned to micro-clusters.

   purity 
0.9883721 
Evaluation results for macro-clusters.
Points were assigned to micro-clusters.

    cRand 
0.9171827 
DBSTREAM
Class: DSC_DBSTREAM, DSC_Micro, DSC_R, DSC 
Number of micro-clusters: 8 
Number of macro-clusters: 2 
  [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  2  2  1  2  2  2  1  1  1  1
 [26]  1  1  1  1  1  1  2  2  2  1  1  2  1  1  1  1  1  1  1  2  1  2  1  2  1
 [51]  3  3  3  4  3  4  3  5  3  4  5  4  4  3  4  3  4  4  4  4  3  4  3  3  3
 [76]  3  3  3  4  4  4  4  4  3  4  3  3  4  4  4  4  3  4  5  4  4  4  4  5  4
[101]  6  3  6  6  6  7 NA  7  6  6  3  3  6 NA  8  6  6  7  7  3  6  8  7  3  6
[126]  6  3  3  6  6  7 NA  6  3  3  7  6  6  3  6  6  6  8  6  6  6  3  3  6  8
attr(,"ids")
[1] 1 2 3 4 5 6 7 9

stream documentation built on June 2, 2018, 9:08 a.m.