EHC: Ensemble for hierarchical clustering

Description Usage Arguments Value References Examples

Description

The Ensemble for Hierarchical Clustering (EHC, \insertCiteHossain2012IntClust) defines the strength of association between a pair of objects as a measure of how closely these are associated taking into account the levels of the dendrogram. Therefore, the sum of the normalized depths of the clusters in which both objects reside is taken as a measure of association. The depths are weighted by the intra-cluster proximity values. The resulting similarity matrix is seen as an adjacency matrix of a graph and the METIS algorithm is performed to cut the graph in $k$ clusters.

Usage

1
2
3
4
5
EHC(List, type = c("data", "dist", "clust"), distmeasure = c("tanimoto",
  "tanimoto"), normalize = c(FALSE, FALSE), method = c(NULL, NULL),
  clust = "agnes", linkage = c("flexible", "flexible"), alpha = 0.625,
  gap = FALSE, maxK = 15, graphPartitioning = c("METIS", "MST"),
  optimalk = 7, waitingtime = 300, file_number = 0, executable = FALSE)

Arguments

List

A list of data matrices. It is assumed the rows are corresponding with the objects.

type

indicates whether the provided matrices in "List" are either data matrices, distance matrices or clustering results obtained from the data. If type="dist" the calculation of the distance matrices is skipped and if type="clusters" the single source clustering is skipped. Type should be one of "data", "dist" or"clusters".

distmeasure

A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto").

normalize

Logical. Indicates whether to normalize the distance matrices or not, default is FALSE. This is recommended if different distance types are used. More details on normalization in Normalization

method

A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets.

clust

Choice of clustering function (character). Defaults to "agnes".

linkage

Choice of inter group dissimilarity (character) for each data set. Defaults to c("flexible", "flexible") for two data sets.

alpha

The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"

gap

Logical. Whether the optimal number of clusters should be determined with the gap statistic. Defaults to FALSE.

maxK

The maximal number of clusters to investigate in the gap statistic. Default is 15.

graphPartitioning

The graph-partitioning method to be performed: "METIS" (implemented in MATLAB), "MST".

optimalk

An estimate of the final optimal number of clusters. Default is 7.

waitingtime

The time in seconds to wait until the MATLAB results are generated. Defaults to 300.

file_number

The specific file number to be placed as a tag in the file generated by MATLAB. Defaults to 00.

executable

Logical. Whether the METIS MATLAB function is performed via an executable on the command line (TRUE, only possible for Linux systems) or by calling on MATLAB directly (FALSE). Defaults to FALSE.

Value

The returned value is a list of two elements:

DistM

The resulting distance matrix

Clust

The resulting clusters

The value has class 'Ensemble'.

References

\insertRef

Hossain2012IntClust

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)

MCF7_EHC=EHC(List=L,type="data",distmeasure=c("tanimoto", "tanimoto"),normalize=
c(FALSE,FALSE),method=c(NULL,NULL),clust="agnes",linkage = c("flexible","flexible"),
alpha=0.625,gap=FALSE,maxK=15,graphPartitioning="METIS",optimalk=7,
waitingtime=300,file_number=00,executable=FALSE)


## End(Not run)

IntClust documentation built on May 2, 2019, 5:51 a.m.