DSC_DBSTREAM | R Documentation |
Micro Clusterer with reclustering. Implements a simple density-based stream clustering algorithm that assigns data points to micro-clusters with a given radius and implements shared-density-based reclustering.
DSC_DBSTREAM(
formula = NULL,
r,
lambda = 0.001,
gaptime = 1000L,
Cm = 3,
metric = "Euclidean",
noise_multiplier = 1,
shared_density = FALSE,
alpha = 0.1,
k = 0,
minweight = 0
)
get_shared_density(x, use_alpha = TRUE)
change_alpha(x, alpha)
## S3 method for class 'DSC_DBSTREAM'
plot(
x,
dsd = NULL,
n = 500,
col_points = NULL,
dim = NULL,
method = "pairs",
type = c("auto", "micro", "macro", "both", "none"),
shared_density = FALSE,
use_alpha = TRUE,
assignment = FALSE,
...
)
DSOutlier_DBSTREAM(
formula = NULL,
r,
lambda = 0.001,
gaptime = 1000L,
Cm = 3,
metric = "Euclidean",
outlier_multiplier = 2
)
formula |
|
r |
The radius of micro-clusters. |
lambda |
The lambda used in the fading function. |
gaptime |
weak micro-clusters (and weak shared density entries) are
removed every |
Cm |
minimum weight for a micro-cluster. |
metric |
metric used to calculate distances. |
noise_multiplier , outlier_multiplier |
multiplier for radius |
shared_density |
Record shared density information. If set to
|
alpha |
For shared density: The minimum proportion of shared points between to clusters to warrant combining them (a suitable value for 2D data is .3). For reachability clustering it is a distance factor. |
k |
The number of macro clusters to be returned if macro is true. |
minweight |
The proportion of the total weight a macro-cluster needs to have not to be noise (between 0 and 1). |
x |
A DSC_DBSTREAM object to get the shared density information from. |
use_alpha |
only return shared density if it exceeds alpha. |
dsd |
a data stream object. |
n |
number of plots taken from the dsd to plot. |
col_points |
color used for plotting. |
dim |
an integer vector with the dimensions to plot. If NULL then for methods "pairs" and "pc" all dimensions are used and for "scatter" the first two dimensions are plotted. |
method |
plot method. |
type |
Plot micro clusters ( |
assignment |
logical; show assignment area of micro-clusters. |
... |
further arguments are passed on to plot or pairs in graphics. |
The DBSTREAM algorithm checks for each new data point in the incoming stream, if it is below the threshold value of dissimilarity value of any existing micro-clusters, and if so, merges the point with the micro-cluster. Otherwise, a new micro-cluster is created to accommodate the new data point.
Although DSC_DBSTREAM is a micro clustering algorithm, macro clusters and weights are available.
update()
invisibly return the assignment of the data points to clusters.
The columns are .class
with the index of the strong micro-cluster and .mc_id
with the permanent id of the strong micro-cluster.
plot()
for DSC_DBSTREAM has two extra logical parameters called
assignment
and shared_density
which show the assignment area
and the shared density graph, respectively.
predict()
can be used to assign new points to clusters. Points are assigned to a micro-cluster if
they are within its assignment area (distance is less then r
times noise_multiplier
).
DSOutlier_DBSTREAM
classifies points as outlier/noise if they that cannot be assigned to a micro-cluster
representing a dense region as a outlier/noise. Parameter outlier_multiplier
specifies
how far a point has to be away from a micro-cluster as a multiplier for the radius r
.
A larger value means that outliers have to be farther away from dense
regions and thus reduce the chance of misclassifying a regular point as an outlier.
An object of class DSC_DBSTREAM
(subclass of DSC,
DSC_R, DSC_Micro).
Michael Hahsler and Matthew Bolanos
Michael Hahsler and Matthew Bolanos. Clustering data streams based on shared density between micro-clusters. IEEE Transactions on Knowledge and Data Engineering, 28(6):1449–1461, June 2016
Other DSC_Micro:
DSC_BICO()
,
DSC_BIRCH()
,
DSC_DStream()
,
DSC_Micro()
,
DSC_Sample()
,
DSC_Window()
,
DSC_evoStream()
Other DSC_TwoStage:
DSC_DStream()
,
DSC_TwoStage()
,
DSC_evoStream()
Other DSOutlier:
DSC_DStream()
,
DSOutlier()
set.seed(1000)
stream <- DSD_Gaussians(k = 3, d = 2, noise = 0.05)
# create clusterer with r = .05
dbstream <- DSC_DBSTREAM(r = .05)
update(dbstream, stream, 500)
dbstream
# check micro-clusters
nclusters(dbstream)
head(get_centers(dbstream))
plot(dbstream, stream)
# plot micro-clusters with assignment area
plot(dbstream, stream, type = "none", assignment = TRUE)
# DBSTREAM with shared density
dbstream <- DSC_DBSTREAM(r = .05, shared_density = TRUE, Cm = 5)
update(dbstream, stream, 500)
dbstream
plot(dbstream, stream)
# plot the shared density graph (several options)
plot(dbstream, stream, type = "micro", shared_density = TRUE)
plot(dbstream, stream, type = "none", shared_density = TRUE, assignment = TRUE)
# see how micro and macro-clusters relate
# each micro-cluster has an entry with the macro-cluster id
# Note: unassigned micro-clusters (noise) have an NA
microToMacro(dbstream)
# do some evaluation
evaluate_static(dbstream, stream, measure = "purity")
evaluate_static(dbstream, stream, measure = "cRand", type = "macro")
# use DBSTREAM also returns the cluster assignment
# later retrieve the cluster assignments for each point)
data("iris")
dbstream <- DSC_DBSTREAM(r = 1)
cl <- update(dbstream, iris[,-5], return = "assignment")
dbstream
head(cl)
# micro-clusters
plot(iris[,-5], col = cl$.class, pch = cl$.class)
# macro-clusters (2 clusters since reachability cannot separate two of the three species)
plot(iris[,-5], col = microToMacro(dbstream, cl$.class))
# use DBSTREAM with a formula (cluster all variables but X2)
stream <- DSD_Gaussians(k = 3, d = 4, noise = 0.05)
dbstream <- DSC_DBSTREAM(formula = ~ . - X2, r = .2)
update(dbstream, stream, 500)
get_centers(dbstream)
# use DBSTREAM for outlier detection
stream <- DSD_Gaussians(k = 3, d = 4, noise = 0.05)
outlier_detector <- DSOutlier_DBSTREAM(r = .2)
update(outlier_detector, stream, 500)
outlier_detector
plot(outlier_detector, stream)
points <- get_points(stream, 20)
points
which(is.na(predict(outlier_detector, points)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.