revisedsil: The revised silhouette
In RSKC: Robust Sparse K-Means

Description Usage Arguments Value Author(s) References Examples

View source: R/Revised-silhouette.R

This function returns a revised silhouette plot, cluster centers in weighted squared Euclidean distances and a matrix containing the weighted squared Euclidean distances between cases and each cluster center. Missing values are adjusted.

1
2
3

revisedsil(d,reRSKC=NULL,CASEofINT=NULL,col1="black",
	CASEofINT2 = NULL, col2="red", print.plot=TRUE, 
	W=NULL,C=NULL,out=NULL)

`d`	A numerical data matrix, `N` by `p`, where `N` is the number of cases and `p` is the number of features.
`reRSKC`	A list output from RSKC function.
`CASEofINT`	Necessary if print.plot=TRUE. A vector of the case indices that appear in the revised silhouette plot. The revised silhouette widths of these indices are colored in `col1` if `CASEofINT != NULL`. The average silhouette of each cluster printed in the plot is computed EXCLUDING these cases.
`col1`	See `CASEofINT`.
`CASEofINT2`	A vector of the case indices that appear in the revised silhouette plot. The indices are colored in `col2`.
`col2`	See `CASEofINT2`
`print.plot`	If `TRUE`, the revised silhouette is plotted.
`W`	Necessary if `reRSKC = NULL`. A positive real vector of weights of length `p`.
`C`	Necessary if `reRSKC = NULL`. An integer vector of class labels of length `N`.
`out`	Necessary if `reRSKC = NULL`. Vector of the case indices that should be excluded in the calculation of cluster centers. In `RSKC`, cluster centers are calculated without the cases that have the furthest 100*`alpha` % Weighted squared Euclidean distances to their closest cluster centers. If one wants to obtain the cluster centers from `RSKC` output, set `out` = `<RSKCoutput>$oW`.

`trans.mu`	Cluster centers in reduced weighted dimension. See example for more detail.
`WdisC`	`N` by `ncl` matrix, where `ncl` is the prespecified number of clusters. It contains the weighted distance between each case and all cluster centers. See example for more detail.
`sil.order`	Silhouette values of each case in the order of the case index.
`sil.i`	Silhouette values of cases ranked by decreasing order within clusters. The corresponding case index are in `obs.i`

Yumi Kondo <y.kondo@stat.ubc.ca>

Yumi Kondo (2011), Robustificaiton of the sparse K-means clustering algorithm, MSc. Thesis, University of British Columbia http://hdl.handle.net/2429/37093

# little simulation function 
sim <-
function(mu,f){
   D<-matrix(rnorm(60*f),60,f)
   D[1:20,1:50]<-D[1:20,1:50]+mu
   D[21:40,1:50]<-D[21:40,1:50]-mu  
   return(D)
   }


### output trans.mu ###

p<-200;ncl<-3
# simulate a 60 by p data matrix with 3 classes 
d<-sim(2,p)
# run RSKC
re<-RSKC(d,ncl,L1=2,alpha=0.05)
# cluster centers in weighted squared Euclidean distances by function sil
sil.mu<-revisedsil(d,W=re$weights,C=re$labels,out=re$oW,print.plot=FALSE)$trans.mu
# calculation 
trans.d<-sweep(d[,re$weights!=0],2,sqrt(re$weights[re$weights!=0]),FUN="*") 
class<-re$labels;class[re$oW]<-ncl+1
MEANs<-matrix(NA,ncl,ncol(trans.d))
for ( i in 1 : 3) MEANs[i,]<-colMeans(trans.d[class==i,,drop=FALSE])
sil.mu==MEANs
# coincides 

### output WdisC ###

p<-200;ncl<-3;N<-60
# generate 60 by p data matrix with 3 classes 
d<-sim(2,p)
# run RSKC
re<-RSKC(d,ncl,L1=2,alpha=0.05)
si<-revisedsil(d,W=re$weights,C=re$labels,out=re$oW,print.plot=FALSE)
si.mu<-si$trans.mu
si.wdisc<-si$WdisC
trans.d<-sweep(d[,re$weights!=0],2,sqrt(re$weights[re$weights!=0]),FUN="*") 
WdisC<-matrix(NA,N,ncl)
for ( i in 1 : ncl) WdisC[,i]<-rowSums(scale(trans.d,center=si.mu[i,],scale=FALSE)^2)
# WdisC and si.wdisc coincides