ClusteringAggregation: Clustering aggregation
In IntClust: Integration of Multiple Data Sets with Clustering Techniques

Description Usage Arguments Details Value References Examples

The ClusteringAggregation includes the ensemble clustering methods Balls, Agglomerative (Aggl.) and Furthest which are graph-based consensus methods.

ClusteringAggregation(List, type = c("data", "dist", "clust"),
  distmeasure = c("tanimoto", "tanimoto"), normalize = c(FALSE, FALSE),
  method = c(NULL, NULL), clust = "agnes", linkage = c("flexible",
  "flexible"), alpha = 0.625, nrclusters = c(7, 7), gap = FALSE,
  maxK = 15, agglMethod = c("Balls", "Aggl", "Furthest", "LocalSearch"),
  improve = TRUE, distThresh_B = 0.5, distThresh_A = 0.8)

`List`	A list of data matrices. It is assumed the rows are corresponding with the objects.
`type`	indicates whether the provided matrices in "List" are either data matrices, distance matrices or clustering results obtained from the data. If type="dist" the calculation of the distance matrices is skipped and if type="clusters" the single source clustering is skipped. Type should be one of "data", "dist" or "clusters".
`distmeasure`	A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto").
`normalize`	Logical. Indicates whether to normalize the distance matrices or not, defaults to c(FALSE, FALSE) for two data sets. This is recommended if different distance types are used. More details on normalization in `Normalization`.
`method`	A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets.
`clust`	Choice of clustering function (character). Defaults to "agnes".
`linkage`	Choice of inter group dissimilarity (character) for each data set. Defaults to c("flexible", "flexible") for two data sets.
`alpha`	The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"
`nrclusters`	The number of clusters to divide each individual dendrogram in. Default is c(7,7) for two data sets.
`gap`	Logical. Whether the optimal number of clusters should be determined with the gap statistic. Default is FALSE.
`maxK`	The maximal number of clusters to investigate in the gap statistic. Default is 15.
`agglMethod`	The method to be performed: "Balls","Aggl","Furthest" or "LocalSearch".
`improve`	Logical. If TRUE, a local search is performed to improve the obtained results. Default is TRUE.
`distThresh_B`	A distance threshold for the Balls algoritme. Default is 0.5.
`distThresh_A`	A distance threshold for the Aggl. algoritme. Default is 0.8.

\insertCite

Gionis2007IntClust propose heuristic algorithms in order to find a solution for the consensus problem. In a first step, a weighted graph is built from the objects with weights between two vertices determined by the fraction of clusterings that place the two vertices in different clusters. In a second step, an algorithm searches for the partition that minimizes the total number of disagreements with the given partitions. The Balls algorithm is an iterative process which finds a cluster for the consensus partition in each iteration. For each object $i$, all objects at a distance of at most 0.5 are collected and the average distance of this set to the $i$th object of interest is calculated. If the average distance is less or equal to a parameter $alpha$ the objects form a cluster; otherwise the object forms a singleton. The Agglomerative (Aggl.) algorithm starts by considering every object as a singleton cluster. Next, the two closest clusters are merged if the average distance between the clusters is less than 0.5. If there are no two clusters with an average distance smaller than 0.5, the algorithm stops and returns the created clusters as a solution. The Furthest algorithm starts by placing all objects into a single cluster. In each iteration, the pair of objects that are the furthest apart are considered as new cluster centers. The remaining objects are appointed to the center that increases the cost of the partition the least and the new cost is computed. The cost is the sum of the all distances between the obtained partition and the partitions in the ensemble. The iteration continues until the cost of the new partition is higher than the previous partition.

The returned value is a list of two elements:

`DistM`	A NULL object
`Clust`	The resulting clustering

The value has class 'Ensemble'.

\insertRef

Gionis2007IntClust

data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)

MCF7_Aggl=ClusteringAggregation(List=L,type="data",distmeasure=c("tanimoto","tanimoto"),
normalize=c(FALSE,FALSE),method=c(NULL,NULL),clust="agnes",linkage = c("flexible",
"flexible"),alpha=0.625,nrclusters=c(7,7),gap = FALSE, maxK = 15,agglMethod="Aggl",
improve=TRUE,distThresh_B=0.5,distThresh_A=0.8)