Description Usage Arguments Details Value Author(s) References See Also Examples
Implements the DStream data stream clustering algorithm.
1 2 3  DSC_DStream(gridsize, lambda = 1e3, gaptime=1000L,
Cm=3, Cl=.8, attraction=FALSE, epsilon=.3, Cm2=Cm, k=NULL, N = 0)
get_attraction(x, relative=FALSE, grid_type = "dense", dist=FALSE)

gridsize 
Size of grid cells. 
lambda 
Fading constant used function to calculate the decay factor 2^lambda. (Note: in the paper the authors use lamba to denote the decay factor and not the fading constant!) 
gaptime 
sporadic grids are removed every gaptime number of points. 
Cm 
density threshold used to detect dense grids as a proportion of the average expected density (Cm > 1). The average density is given by the total weight of the clustering over N, the number of grid cells. 
Cl 
density threshold to detect sporadic grids (0 > Cl > Cm). Transitional grids have a density between Cl and Cm. 
attraction 
compute and store information about the
attraction between
adjacent grids. If 
epsilon 
overlap parameter for attraction as a proportion of

Cm2 
threshold on attraction to join two dense grid cells
(as a proportion on the average expected attraction).
In the original algorithm 
k 
alternative to Cm2 (not in the original algorithm). Create k clusters based on attraction. In case of more than k unconnected components, closer groups of MCs are joined. 
N 
Fix the number of grid cells used for the calculation
of the density thresholds with Cl and Cm. If 
x 
DSC_DStream object to get attraction values from. 
relative 
calculates relative attraction (normalized by the cluster weight). 
grid_type 
the attraction between what grid types should be returned? 
dist 
make attraction symmetric and transform into a distance. 
DStream creates an equally spaced grid and estimates the density in each grid cell using the count of points falling in the cells. Grid cells are classified based on density into dense, transitional and sporadic cells. The density is faded after every new point by a factor of 2^{lambda}. Every gaptime number of points sporadic grid cells are removed.
For reclustering DStream (2007 version) merges adjacent dense grids to
form macroclusters and then assigns adjacent transitional grids to
macroclusters. This behavior is implemented as attraction=FALSE
.
The 2009 version of the algorithm adds the concept of attraction between
grids cells. If attraction=TRUE
is used then the algorithm
produces macroclusters based on attraction between dense adjacent grids
(uses Cm2
which in the original algorithm is equal to Cm
).
For many functions (e.g., get_centers()
, plot()
),
DStream adds a parameter grid_type
with possible
values of "dense"
, "transitional"
, "sparse"
,
"all"
and "used"
. This only returns the selected type
of grid cells. "used"
includes dense and adjacent transitional cells
which are used in DStream for reclustering.
For plot DStream also provides extra parameters "grid"
and
"grid_type"
to show microclusters as grid cells
(density represented by gray values).
Note that DSC_DStream
can at this point not be saved to disk using
save() or saveRDS(). This functionality will be added later!
An object of class DSC_DStream
(subclass of DSC
, DSC_R
, DSC_Micro
).
Michael Hahsler
Yixin Chen and Li Tu. 2007. Densitybased clustering for realtime stream data. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07). ACM, New York, NY, USA, 133142.
Li Tu and Yixin Chen. 2009. Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data, 3(3), Article 12 (July 2009), 27 pages.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38  stream < DSD_BarsAndGaussians(noise=.05)
plot(stream)
# we set Cm=.8 to pick up the lower density clusters
dstream1 < DSC_DStream(gridsize=1, Cm=1.5)
update(dstream1, stream, 1000)
dstream1
# microclusters (these are "used" grid cells)
nclusters(dstream1)
head(get_centers(dstream1))
# plot (DStream provides additional grid visualization)
plot(dstream1, stream)
plot(dstream1, stream, grid=TRUE)
# look only at dense grids
nclusters(dstream1, grid_type="dense")
plot(dstream1, stream, grid=TRUE, grid_type="dense")
# look at transitional and sparse cells
plot(dstream1, stream, grid=TRUE, grid_type="transitional")
plot(dstream1, stream, grid=TRUE, grid_type="sparse")
### Macroclusters
# standard DStream uses reachability
nclusters(dstream1, type="macro")
get_centers(dstream1, type="macro")
plot(dstream1, stream, type="both", grid=TRUE)
evaluate(dstream1, stream, measure="crand", type="macro")
# use attraction for reclustering
dstream2 < DSC_DStream(gridsize=1, attraction=TRUE, Cm=1.5)
update(dstream2, stream, 1000)
dstream2
plot(dstream2, stream, type="both", grid=TRUE)
evaluate(dstream2, stream, measure="crand", type="macro")

Loading required package: proxy
Attaching package: 'proxy'
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
DStream
Class: DSC_DStream, DSC_Micro, DSC_R, DSC
Number of microclusters: 38
Number of macroclusters: 3
[1] 38
[,1] [,2]
[1,] 3.5 4.5
[2,] 3.5 3.5
[3,] 3.5 2.5
[4,] 2.5 4.5
[5,] 2.5 3.5
[6,] 2.5 2.5
[1] 24
[1] 3
X1 X2
1 2.1152379 2.649665
2 0.7145228 3.442311
3 1.0555051 1.500000
Evaluation results for macroclusters.
Points were assigned to microclusters.
cRand
0.616245
DStream
Class: DSC_DStream, DSC_Micro, DSC_R, DSC
Number of microclusters: 44
Number of macroclusters: 5
Evaluation results for macroclusters.
Points were assigned to microclusters.
cRand
0.6812977
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.