Description Usage Arguments Details Value Note Author(s) Examples
The function Cluster
was written to perform clustering on
a single source of information, i.e one data matrix. The option is
available to compute the gap statistic to determine the optimal
number of clusters.
1 2 3 |
Data |
A matrix containing the data. It is assumed the rows are corresponding with the objects. |
type |
Type indicates whether the provided matrix in "Data" is either a data or a distance matrix obtained from the data. If type="dist" the calculation of the distance matrix is skipped. Type should be one of "data" or "dist". |
distmeasure |
Choice of metric for the dissimilarity matrix (character). Should be one of "tanimoto", "euclidean", "jaccard","hamming". |
normalize |
Logical. Indicates whether to normalize the distance matrices or not.
This is recommended if different distance types are used. More details
on normalization in |
method |
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. |
clust |
Choice of clustering function (character). Defaults to "agnes". |
linkage |
Choice of inter group dissimilarity (character). Defaults to "ward". |
alpha |
The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible" |
gap |
Logical. Indicator if gap statistics should be computed. Setting to $FALSE$ will greatly reduce the computation time. |
maxK |
The maximum number of clusters to be considered during the gap. |
StopRange |
Logical. Indicates whether the distance matrices with values not between zero and one should be standardized to have so.
If FALSE the range normalization is performed. See |
The gap statistic is determined by the criteria described by the cluster package:
firstSEmax, globalSEmax, firstmax,globalmax, Tibs2001SEmax. The number of
iterations is set to a default of 500. The implemented distances to be used for
the dissimilarity matrix are jaccard, tanimoto and euclidean. The jaccard distances
were computed with the dist.binary(...,method=1)
function in the ade4
package and the euclidean ones with the daisy
function in again the cluster
package. The Tanimoto distances were implemented manually.
The returned value is a list with two elements:
DistM |
The distance matrix of the data matrix |
Clust |
The resulting clustering |
If the gap option was indicated to be true, another 3 elements are joined to the list. Clust\_gap contains the output from the function to compute the gap statistics and gapdata is a subset of this output. Both can be used to make plots to visualize the gap statistic. The final component is k which is a matrix containing the optimal number of clusters determined by each criterion mentioned earlier.
For now, the only option is to carry out agglomerative
hierarchical clustering as it was implemented
in the agnes
function in the cluster package.
Marijke Van Moerbeke
1 2 3 4 5 6 7 8 9 | data(fingerprintMat)
data(targetMat)
MCF7_F = Cluster(fingerprintMat,type="data",distmeasure="tanimoto",normalize=FALSE,
method=NULL,clust="agnes",linkage="ward",alpha=0.625,gap=FALSE,maxK=55
,StopRange=FALSE)
MCF7_T = Cluster(targetMat,type="data",distmeasure="tanimoto",normalize=FALSE,
method=NULL,clust="agnes",linkage="ward",alpha=0.625,gap=FALSE,maxK=55
,StopRange=FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.