Description Usage Arguments Value Examples
The function SelectnrClusters
determines an optimal optimal number of
clusters based by calculating silhouettes widths for a sequence of clusters.
See "Details" for a more elaborate description.
If the object provided in List are data or distance matrices clustering
around medoids is performed with the pam
function of the
cluster package. Of the obtained pam objects, average silhouette
widths are retrieved. A silhouette width represents how well an object lies
in its current cluster. Values around one are an indication of an
appropriate clustering while values around zero show that the object might
as well lie in the neighbouring cluster. The average silhouette width is a
measure of how tightly grouped the data is. This is performed for every
number of cluster for every object provided in List. Then the average is
taken for every number of clusters over the provided objects. This results
in one average value per number of clusters. The number width the maximal
average silhouette width is chosen as the optimal number of clusters.
1 2 3 4 |
List |
A list of data matrices. It is assumed the rows are corresponding with the objects. |
type |
indicates whether the provided matrices in "List" are either data matrices, distance matrices or clustering results obtained from the data. If type="dist" the calculation of the distance matrices is skipped and if type="clusters" the single source clustering is skipped. Type should be one of "data", "dist" or "clusters". |
distmeasure |
A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto"). |
normalize |
Logical. Indicates whether to normalize the distance matrices or not, defaults to c(FALSE, FALSE) for two data sets. This is recommended if different distance types are used. More details on normalization in |
method |
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets. |
nrclusters |
A sequence of numbers of clusters to cut the dendrogram in. Default is a sequence of 5 to 25. |
names |
The labels to give to the elements in List. Default is NULL. |
StopRange |
Logical. Indicates whether the distance matrices with
values not between zero and one should be standardized to have so. If FALSE
the range normalization is performed. See |
plottype |
Should be one of "pdf","new" or "sweave". If "pdf", a location should be provided in "location" and the figure is saved there. If "new" a new graphic device is opened and if "sweave", the figure is made compatible to appear in a sweave or knitr document, i.e. no new device is opened and the plot appears in the current device or document. Default is "new". |
location |
If plottype is "pdf", a location should be provided in "location" and the figure is saved there. Default is NULL. |
A plots are made showing the average silhouette widths of the provided objects for each number of clusters. Further, a list with two elements is returned:
Silhouette_Widths |
A data frame with the silhouette widths for each object and the average silhouette widths per number of clusters |
Optimal_Nr_of_CLusters |
The determined optimal number of cluster |
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Not run:
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)
NrClusters=SelectnrClusters(List=L,type="data",distmeasure=c("tanimoto",
"tanimoto"),nrclusters=seq(5,10),normalize=c(FALSE,FALSE),method=c(NULL,NULL),
names=c("FP","TP"),StopRange=FALSE,plottype="new",location=NULL)
NrClusters
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.