The function `Cluster`

performs clustering on a single source of information, i.e one data matrix. The option is available to compute the gap statistic to determine the optimal number of clusters.

`Data` |
A matrix containing the data. It is assumed the rows are corresponding with the objects. |

`type` |
Type indicates whether the provided matrix in "Data" is either a data or a distance matrix obtained from the data. If type="dist" the calculation of the distance matrix is skipped. Type should be one of "data" or "dist". |

`distmeasure` |
Choice of metric for the dissimilarity matrix (character). Should be one of "tanimoto", "euclidean", "jaccard","hamming". Default is "tanimoto". |

`normalize` |
Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in |

`method` |
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is NULL. |

`clust` |
Choice of clustering function (character). Defaults to "agnes". Note for now, the only option is to carry out agglomerative hierarchical clustering as it was implemented in the |

`linkage` |
Choice of inter group dissimilarity (character). Defaults to "flexible". |

`alpha` |
The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible" |

`gap` |
Logical. Whether the optimal number of clusters should be determined with the gap statistic. Default is TRUE. |

`maxK` |
The maximal number of clusters to investigate in the gap statistic. Default is 15. |

`StopRange` |
Logical. Indicates whether the distance matrices with values not between zero and one should be standardized to have so.
#' If FALSE the range normalization is performed. See |

The gap statistic is determined by the criteria described by the cluster package:
firstSEmax, globalSEmax, firstmax,globalmax, Tibs2001SEmax. The number of
iterations is set to a default of 500. The implemented distances to be used for
the dissimilarity matrix are jaccard, tanimoto and euclidean. The jaccard distances
were computed with the `dist.binary(...,method=1)`

function in the ade4
package and the euclidean ones with the `daisy`

function in again the cluster
package. The Tanimoto distances were implemented manually.

The returned value is a list with two elements:

`DistM` |
The distance matrix of the data matrix |

`Clust` |
The resulting clustering |

If the gap option was indicated to be true, another 3 elements are joined to the list. Clust\_gap contains the output from the function to compute the gap statistics and gapdata is a subset of this output. Both can be used to make plots to visualize the gap statistic. The final component is k which is a matrix containing the optimal number of clusters determined by each criterion mentioned earlier.

1 2 3 4 5 6 7 8 9 | ```
data(fingerprintMat)
data(targetMat)
MCF7_F = Cluster(fingerprintMat,type="data",distmeasure="tanimoto",normalize=FALSE,
method=NULL,clust="agnes",linkage="flexible",alpha=0.625,gap=FALSE,maxK=55
,StopRange=FALSE)
MCF7_T = Cluster(targetMat,type="data",distmeasure="tanimoto",normalize=FALSE,
method=NULL,clust="agnes",linkage="flexible",alpha=0.625,gap=FALSE,maxK=55
,StopRange=FALSE)
``` |

