Description Usage Arguments Details Value Methods References See Also Examples
Computes the densitybased silhouette information of clustered data. Two methods are associated to this function. The first
method applies to two arguments: the matrix of data and the vector of cluster labels; the second method applies to objects of
pdfClusterclass
.
1 2 3 4 5 6 7 
x 
A matrix of data points partitioned by any densitybased clustering method or an object of 
clusters 
Cluster labels of grouped data. This argument has not to be set when 
h.funct 
Function to estimate the smoothing parameters. Default is 
hmult 
Shrink factor to be multiplied by the smoothing parameters. Default value is 1. 
prior 
Vector of prior probabilities of belonging to the groups. When 
stage 
When 
... 
Further arguments to be passed to methods (see 
This function provides diagnostics for a clustering produced by any densitybased clustering method. The dbs
information is a suitable modification of the silhouette
information aimed at evaluating
the cluster quality in a density based framework. It is based on the estimation of data posterior probabilities of belonging to the clusters. It may be
used to measure the quality of data allocation to the clusters. High values of the \hat{dbs} are evidence of a good quality clustering.
Define
\hat{τ}_m(x_i)=\frac{π_{m} \hat{f}(x_ix_ \in m)}{∑_{m=1}^M π_{m}\hat{f}(x_ix_i \in m)} \quad m=1,…,M,
where π_{m} is a prior probability of m and \hat{f}(x_ix_i \in m) is a density estimate at x_i evaluated with function kepdf
by using the only data points in m. Density estimation is performed with fixed bandwidths h
, as evaluated by function h.funct
, possibly multiplied by the shrink factor hmult
.
Densitybased silhouette information of x_i, the i^{th} row of the data matrix x
, is defined as follows:
\hat{dbs}_i=\frac{\log≤ft(\frac{\hat{τ}_{m_{0}}(x_i)}{\hat{τ}_{m_{1}}(x_i)}\right)}{{\max}_{x_i }≤ft \log≤ft(\frac{\hat{τ}_{m_{0}}(x_i)}{\hat{τ}_{m_{1}}(x_i)}\right)\right},
where m_0 is the group where x_i has been allocated and m_1 is the group for which τ_m is maximum, m\neq m_0.
Note: when there exists x_j such that \hat{τ}_{m_{1}}(x_j) is zero, \hat{dbs}_j is forced to 1 and {\max}_{x_i }≤ft \log≤ft(\frac{\hat{τ}_{m_{0}}(x_i)}{\hat{τ}_{m_{1}}(x_i)}\right)\right
is computed by excluding x_j from tha data matrix x
.
See Menardi (2011) for a detailed treatment.
An object of class "dbs"
, with slots:
call 
The matched call. 
x 
The matrix of clustered data points. 
prior 
The vector of prior probabilities of belonging to the groups. 
dbs 
A vector reporting the densitybased silhouette information of the clustered data. 
clusters 
Cluster labels of grouped data. 
noc 
Number of clusters 
stage 
If argument 
See dbsclass
for more details.
signature(x = "matrix", clusters = "numeric")
Computes the density based silhouette information for objects partitioned according to any densitybased clustering method.
signature(x = "pdfCluster", clusters = "missing")
Computes the density based silhouette information for objects of class
"pdfCluster"
.
Menardi, G. (2011) Densitybased Silhouette diagnostics for clustering methods. Statistics and Computing, 21, 295308.
dbsclass
, plot,dbsmethod
, silhouette
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  #example 1: no groups in data
#random generation of group labels
set.seed(54321)
x < rnorm(50)
groups < sample(1:2, 50, replace = TRUE)
groups
dsil < dbs(x = as.matrix(x), clusters=groups)
dsil
summary(dsil)
plot(dsil, labels=TRUE, lwd=6)
#example 2: wines data
# load data
data(wine)
# select a subset of variables
x < wine[, c(2,5,8)]
#clustering
cl < pdfCluster(x)
dsil < dbs(cl)
plot(dsil)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.