CEC: Complementary ensemble clustering In IntClust: Integration of Multiple Data Sets with Clustering Techniques

Description

Complementary Ensemble Clustering (CEC) Complementary Ensemble Clustering (CEC, Fodeh2013) shows similarities with ADEC. However, instead of merging the data matrices, ensemble clustering is performedon each data matrix separately. The resulting incidence matrices for each data sets are combined in a weighted linear equation. The weighted incidence matrix is the input for the final clustering algorithm. Similarly as ADEC, there are versions depending of the specification of the number of features to sample and the number of clusters.

Usage

 ```1 2 3 4``` ```CEC(List, distmeasure = c("tanimoto", "tanimoto"), normalize = c(FALSE, FALSE), method = c(NULL, NULL), t = 10, r = NULL, nrclusters = NULL, weight = NULL, clust = "agnes", linkage = c("flexible", "flexible"), alpha = 0.625, weightclust = 0.5) ```

Arguments

 `List` A list of data matrices. It is assumed the rows are corresponding with the objects. `distmeasure` A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto"). `normalize` Logical. Indicates whether to normalize the distance matrices or not, defaults to c(FALSE, FALSE) for two data sets. This is recommended if different distance types are used. More details on normalization in `Normalization`. `method` A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets. `t` The number of iterations. Defaults to 10. `r` A vector with the number of features to take for the random sample for each element in List. If NULL (default), all features are considered. `nrclusters` A list with a sequence of numbers of clusters to cut the dendrogram in for each element in List. If NULL (default), the function stops. `weight` The weights for the weighted linear combination. `clust` Choice of clustering function (character). Defaults to "agnes". `linkage` Choice of inter group dissimilarity (character) for each data set. Defaults to c("flexible", "flexible") for two data sets. `alpha` The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible" `weightclust` A weight for which the result will be put aside of the other results. This was done for comparative reason and easy access.

Details

If r is specified and nrclusters is a fixed number, only a random sampling of the features will be performed for the t iterations (CECa). If r is NULL and the nrclusters is a sequence, the clustering is performedon all features and the dendrogam is divided into clusters for the values of nrclusters (CECb). If both r is specified and nrclusters is a sequence, the combination is performed (CECc). After every iteration, either be random sampling, multiple divisions of the dendrogram or both, an incidence matrix is set up. All incidence matrices are summed and represent the distance matrix on which a final clustering is performed.

Value

The returned value is a list of four elements:

 `DistM` The resulting incidence matrix `Results` The hierarchical clustering result for each element in WeightedDist `Clust` The result for the weight specified in Clustweight

The value has class 'CEC'.

References

\insertRef

Fodeh2013IntClust

Examples

 ```1 2 3 4 5 6 7``` ```data(fingerprintMat) data(targetMat) L=list(fingerprintMat,targetMat) MCF7_CEC=CEC(List=L,distmeasure=c("tanimoto","tanimoto"),normalize=FALSE,method=NULL ,t=100, r=c(100,100), nrclusters=list(seq(2,10,1),seq(2,10,1)),clust="agnes",linkage= c("flexible","flexible"),alpha=0.625,weightclust=0.5) ```

IntClust documentation built on May 2, 2019, 5:51 a.m.