Cluster: Single source clustering
In IntClust: Integration of Multiple Data Sets with Clustering Techniques

Description Usage Arguments Details Value Examples

The function Cluster performs clustering on a single source of information, i.e one data matrix. The option is available to compute the gap statistic to determine the optimal number of clusters.

Cluster(Data, type = c("data", "dist"), distmeasure = "tanimoto",
  normalize = FALSE, method = NULL, clust = "agnes",
  linkage = "flexible", alpha = 0.625, gap = TRUE, maxK = 15,
  StopRange = TRUE)

`Data`	A matrix containing the data. It is assumed the rows are corresponding with the objects.
`type`	Type indicates whether the provided matrix in "Data" is either a data or a distance matrix obtained from the data. If type="dist" the calculation of the distance matrix is skipped. Type should be one of "data" or "dist".
`distmeasure`	Choice of metric for the dissimilarity matrix (character). Should be one of "tanimoto", "euclidean", "jaccard","hamming". Default is "tanimoto".
`normalize`	Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in `Normalization`
`method`	A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is NULL.
`clust`	Choice of clustering function (character). Defaults to "agnes". Note for now, the only option is to carry out agglomerative hierarchical clustering as it was implemented in the `agnes` function in the cluster package.
`linkage`	Choice of inter group dissimilarity (character). Defaults to "flexible".
`alpha`	The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"
`gap`	Logical. Whether the optimal number of clusters should be determined with the gap statistic. Default is TRUE.
`maxK`	The maximal number of clusters to investigate in the gap statistic. Default is 15.
`StopRange`	Logical. Indicates whether the distance matrices with values not between zero and one should be standardized to have so. #' If FALSE the range normalization is performed. See `Normalization`. If TRUE, the distance matrices are not changed. This is recommended if different types of data are used such that these are comparable. Default is TRUE.

The gap statistic is determined by the criteria described by the cluster package: firstSEmax, globalSEmax, firstmax,globalmax, Tibs2001SEmax. The number of iterations is set to a default of 500. The implemented distances to be used for the dissimilarity matrix are jaccard, tanimoto and euclidean. The jaccard distances were computed with the dist.binary(...,method=1) function in the ade4 package and the euclidean ones with the daisy function in again the cluster package. The Tanimoto distances were implemented manually.

The returned value is a list with two elements:

`DistM`	The distance matrix of the data matrix
`Clust`	The resulting clustering

If the gap option was indicated to be true, another 3 elements are joined to the list. Clust\_gap contains the output from the function to compute the gap statistics and gapdata is a subset of this output. Both can be used to make plots to visualize the gap statistic. The final component is k which is a matrix containing the optimal number of clusters determined by each criterion mentioned earlier.

data(fingerprintMat)
data(targetMat)

MCF7_F = Cluster(fingerprintMat,type="data",distmeasure="tanimoto",normalize=FALSE,
		method=NULL,clust="agnes",linkage="flexible",alpha=0.625,gap=FALSE,maxK=55
		,StopRange=FALSE)
MCF7_T = Cluster(targetMat,type="data",distmeasure="tanimoto",normalize=FALSE,
		method=NULL,clust="agnes",linkage="flexible",alpha=0.625,gap=FALSE,maxK=55
		,StopRange=FALSE)