HBGF: Hybrid bipartite graph formulation

Description Usage Arguments Value References Examples

Description

Hybrid Bipartite Graph Formulation (HBGF) is a graph-based consensus multi-source clustering technique. The method builds a bipartite graph in which the two types of vertices are represented by the objects on one hand and the clusters of the partitions on the other hand. An edge is only present between an object vertex and a cluster vertex indicating that the object belongs to that cluster. The graph can be partitioned with the Spectral clustering \insertCiteNg2000IntClust.

Usage

1
2
3
4
5
HBGF(List, type = c("data", "dist", "clust"), distmeasure = c("tanimoto",
  "tanimoto"), normalize = c(FALSE, FALSE), method = c(NULL, NULL),
  clust = "agnes", linkage = c("flexible", "flexible"), alpha = 0.625,
  nrclusters = c(7, 7), gap = FALSE, maxK = 15,
  graphPartitioning = "Spec", optimalk = 7)

Arguments

List

A list of data matrices. It is assumed the rows are corresponding with the objects.

type

indicates whether the provided matrices in "List" are either data matrices, distance matrices or clustering results obtained from the data. If type="dist" the calculation of the distance matrices is skipped and if type="clusters" the single source clustering is skipped. Type should be one of "data", "dist" or "clusters".

distmeasure

A vector of the distance measures to be used on each data matrix. Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to c("tanimoto","tanimoto").

normalize

Logical. Indicates whether to normalize the distance matrices or not, defaults to c(FALSE, FALSE) for two data sets. This is recommended if different distance types are used. More details on normalization in Normalization.

method

A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is c(NULL,NULL) for two data sets.

clust

Choice of clustering function (character). Defaults to "agnes".

linkage

Choice of inter group dissimilarity (character) for each data set. Defaults to c("flexible", "flexible") for two data sets.

alpha

The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible"

nrclusters

The number of clusters to divide each individual dendrogram in. Default is c(7,7) for two data sets.

gap

Logical. Whether the optimal number of clusters should be determined with the gap statistic. Default is FALSE.

maxK

The maximal number of clusters to investigate in the gap statistic. Default is 15.

graphPartitioning

A character string indicating the preferred graph partitioning algorithm. For now only spectral clustering ("Spec") is implemented. Defaults to "Spec".

optimalk

An estimate of the final optimal number of clusters. Default is 7.

Value

The returned value is a list of two elements:

DistM

A NULL object

Clust

The resulting clustering

The value has class 'Ensemble'.

References

\insertRef

Fern2004IntClust

Examples

1
2
3
4
5
6
7
8
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)

MCF7_HBGF=HBGF(List=L,type="data",distmeasure=c("tanimoto","tanimoto"),normalize=
c(FALSE,FALSE),method=c(NULL,NULL),clust="agnes",linkage = c("flexible",
"flexible"),nrclusters=c(7,7),gap = FALSE, maxK = 15,graphPartitioning="Spec",
optimalk=7)

Example output



IntClust documentation built on May 2, 2019, 5:51 a.m.