plot.valstat | R Documentation |
Visualisation and print function for cluster validation output compared to results on simulated random clusterings. The print method can also be used to compute and print an aggregated cluster validation index.
Unlike for many other plot methods, the additional arguments
of plot.valstat
are essential. print.valstat
should make
good sense with the defaults, but for computing the aggregate index
need to be set.
## S3 method for class 'valstat'
plot(x,simobject=NULL,statistic="sindex",
xlim=NULL,ylim=c(0,1),
nmethods=length(x)-5,
col=1:nmethods,cex=1,pch=c("c","f","a","n"),
simcol=rep(grey(0.7),4),
shift=c(-0.1,-1/3,1/3,0.1),include.othernc=NULL,...)
## S3 method for class 'valstat'
print(x,statistics=x$statistics,
nmethods=length(x)-5,aggregate=FALSE,
weights=NULL,digits=2,
include.othernc=NULL,...)
x |
object of class |
simobject |
list of simulation results as produced by
|
statistic |
one of |
xlim |
passed on to |
ylim |
passed on to |
nmethods |
integer. Number of clustering methods to involve
(these are those from number 1 to |
col |
colours used for the different clustering methods. |
cex |
passed on to |
pch |
vector of symbols for random clustering results from
|
simcol |
vector of colours used for random clustering results in
order |
shift |
numeric vector. Indicates the amount to which the results
from |
include.othernc |
this indicates whether methods should be
included that estimated their number of clusters themselves and gave
a result outside the standard range as given by |
statistics |
vector of character strings specifying the validation statistics that will be included in the output (unless you want to restrict the output for some reason, the default should be fine. |
aggregate |
logical. If |
weights |
vector of numericals. Weights for computation of the
aggregate statistic in case that |
digits |
minimal number of significant digits, passed on to
|
... |
no effect. |
Whereas print.valstat
, at least with aggregate=TRUE
makes more sense for the qstat
or sstat
-component of the
clusterbenchstats
-output rather than the
stat
-component, plot.valstat
should be run with the
stat
-component if simobject
is specified, because the
simulated cluster validity statistics are unstandardised and need to
be compared with unstandardised values on the dataset of interest.
print.valstat
will print all values for all validation indexes
and the aggregated index (in case of aggregate=TRUE
and set
weights
will be printed last.
print.valstats
returns the results table as invisible object.
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
clusterbenchstats
, valstat.object
,
cluster.magazine
set.seed(20000)
options(digits=3)
face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
clustermethod=c("kmeansCBI","hclustCBI","hclustCBI")
clustermethodpars <- list()
clustermethodpars[[2]] <- clustermethodpars[[3]] <- list()
clustermethodpars[[2]]$method <- "ward.D2"
clustermethodpars[[3]]$method <- "single"
methodname <- c("kmeans","ward","single")
cbs <- clusterbenchstats(face,G=2:3,clustermethod=clustermethod,
methodname=methodname,distmethod=rep(FALSE,3),
clustermethodpars=clustermethodpars,nnruns=2,kmruns=2,fnruns=2,avenruns=2)
plot(cbs$stat,cbs$sim)
plot(cbs$stat,cbs$sim,statistic="dindex")
plot(cbs$stat,cbs$sim,statistic="avewithin")
pcbs <- print(cbs$sstat,aggregate=TRUE,weights=c(1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0))
# Some of the values are "NaN" because due to the low number of runs of
# the stupid clustering methods there is no variation. If this happens
# in a real application, nnruns etc. should be chosen higher than 2.
# Also useallg=TRUE in clusterbenchstats may help.
#
# Finding the best aggregated value:
mpcbs <- as.matrix(pcbs[[17]][,-1])
which(mpcbs==max(mpcbs),arr.ind=TRUE)
# row=1 refers to the first clustering method kmeansCBI,
# col=2 refers to the second number of clusters, which is 3 in g=2:3.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.