Description Usage Arguments Details Value Examples
The function Cluster
performs clustering on a single source of information, i.e one data matrix. The option is available to compute the gap statistic to determine the optimal number of clusters.
1 2 3 4 |
Data |
A matrix containing the data. It is assumed the rows are corresponding with the objects. |
type |
Type indicates whether the provided matrix in "Data" is either a data or a distance matrix obtained from the data. If type="dist" the calculation of the distance matrix is skipped. Type should be one of "data" or "dist". |
distmeasure |
Choice of metric for the dissimilarity matrix (character). Should be one of "tanimoto", "euclidean", "jaccard","hamming". Default is "tanimoto". |
normalize |
Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in |
method |
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is NULL. |
clust |
Choice of clustering function (character). Defaults to "agnes". Note for now, the only option is to carry out agglomerative hierarchical clustering as it was implemented in the |
linkage |
Choice of inter group dissimilarity (character). Defaults to "flexible". |
alpha |
The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible" |
gap |
Logical. Whether the optimal number of clusters should be determined with the gap statistic. Default is TRUE. |
maxK |
The maximal number of clusters to investigate in the gap statistic. Default is 15. |
StopRange |
Logical. Indicates whether the distance matrices with values not between zero and one should be standardized to have so.
#' If FALSE the range normalization is performed. See |
The gap statistic is determined by the criteria described by the cluster package:
firstSEmax, globalSEmax, firstmax,globalmax, Tibs2001SEmax. The number of
iterations is set to a default of 500. The implemented distances to be used for
the dissimilarity matrix are jaccard, tanimoto and euclidean. The jaccard distances
were computed with the dist.binary(...,method=1)
function in the ade4
package and the euclidean ones with the daisy
function in again the cluster
package. The Tanimoto distances were implemented manually.
The returned value is a list with two elements:
DistM |
The distance matrix of the data matrix |
Clust |
The resulting clustering |
If the gap option was indicated to be true, another 3 elements are joined to the list. Clust\_gap contains the output from the function to compute the gap statistics and gapdata is a subset of this output. Both can be used to make plots to visualize the gap statistic. The final component is k which is a matrix containing the optimal number of clusters determined by each criterion mentioned earlier.
1 2 3 4 5 6 7 8 9 | data(fingerprintMat)
data(targetMat)
MCF7_F = Cluster(fingerprintMat,type="data",distmeasure="tanimoto",normalize=FALSE,
method=NULL,clust="agnes",linkage="flexible",alpha=0.625,gap=FALSE,maxK=55
,StopRange=FALSE)
MCF7_T = Cluster(targetMat,type="data",distmeasure="tanimoto",normalize=FALSE,
method=NULL,clust="agnes",linkage="flexible",alpha=0.625,gap=FALSE,maxK=55
,StopRange=FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.