Description Usage Arguments Details Value Note Author(s) References Examples
Generates a density based clustering of arbitrary shape as introduced in Ester et al. (1996).
1 2 3 4 5 6 7 8 9  dbscan(data, eps, MinPts = 5, scale = FALSE, method = c("hybrid", "raw",
"dist"), seeds = TRUE, showplot = FALSE, countmode = NULL)
## S3 method for class 'dbscan'
print(x, ...)
## S3 method for class 'dbscan'
plot(x, data, ...)
## S3 method for class 'dbscan'
predict(object, data, newdata = NULL,
predict.max=1000, ...)

data 
data matrix, data.frame, dissimilarity matrix or

eps 
Reachability distance, see Ester et al. (1996). 
MinPts 
Reachability minimum no. of points, see Ester et al. (1996). 
scale 
scale the data if 
method 
"dist" treats data as distance matrix (relatively fast but memory expensive), "raw" treats data as raw data and avoids calculating a distance matrix (saves memory but may be slow), "hybrid" expects also raw data, but calculates partial distance matrices (very fast with moderate memory requirements). 
seeds 
FALSE to not include the 
showplot 
0 = no plot, 1 = plot per iteration, 2 = plot per subiteration. 
countmode 
NULL or vector of point numbers at which to report progress. 
x 
object of class 
object 
object of class 
newdata 
matrix or data.frame with raw data to predict. 
predict.max 
max. batch size for predictions. 
... 
Further arguments transferred to plot methods. 
Clusters require a minimum no of points (MinPts) within a maximum distance (eps) around one of its members (the seed). Any point within eps around any point which satisfies the seed condition is a cluster member (recursively). Some points may not belong to any clusters (noise).
We have clustered a 100.000 x 2 dataset in 40 minutes on a Pentium M 1600 MHz.
print.dbscan
shows a statistic of the number of points
belonging to the clusters that are seeds and border points.
plot.dbscan
distinguishes between seed and border points by
plot symbol.
predict.dbscan
gives out a vector of predicted clusters for the
points in newdata
.
dbscan
gives out
an object of class 'dbscan' which is a LIST with components
cluster 
integer vector coding cluster membership with noise observations (singletons) coded as 0 
isseed 
logical vector indicating whether a point is a seed (not border, not noise) 
eps 
parameter eps 
MinPts 
parameter MinPts 
this is a simplified version of the original algorithm (no KDtrees used), thus we have o(n^2) instead of o(n*log(n))
Jens Oehlschlaegel, based on a draft by Christian Hennig.
Martin Ester, HansPeter Kriegel, Joerg Sander, Xiaowei Xu (1996). A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD96).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  set.seed(665544)
n < 600
x < cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
sd=0.2))
par(bg="grey40")
ds < dbscan(x, 0.2)
# run with showplot=1 to see how dbscan works.
ds
plot(ds, x)
x2 < matrix(0,nrow=4,ncol=2)
x2[1,] < c(5,2)
x2[2,] < c(8,3)
x2[3,] < c(4,4)
x2[4,] < c(9,9)
predict(ds, x, x2)
n < 600
x < cbind((1:3)+rnorm(n, sd=0.2), (1:3)+rnorm(n, sd=0.2))
# Not run, but results from my machine are 0.105  0.068  0.255:
# system.time(ds < dbscan(x, 0.3, countmode=NULL, method="raw"))[3]
# system.time(dsb < dbscan(x, 0.3, countmode=NULL, method="hybrid"))[3]
# system.time(dsc < dbscan(dist(x), 0.3, countmode=NULL,
# method="dist"))[3]

dbscan Pts=600 MinPts=5 eps=0.2
0 1 2 3 4 5 6 7 8 9 10 11
border 28 4 4 8 5 3 3 4 3 4 6 4
seed 0 50 53 51 52 51 54 54 54 53 51 1
total 28 54 57 59 57 54 57 58 57 57 57 5
[1] 4 9 0 0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.