A partial clustering algorithm with automatic estimation of the number of clusters and identification of outliers

Share:

Description

This function performs the CrossClustering algorithm. This method combines the Ward's minimum variance and Complete Linkage algorithms, providing automatic estimation of a suitable number of clusters and identification of outlier elements.

Usage

1
CrossClustering(d, k.w.min = 2, k.w.max, k.c.max, out = TRUE)

Arguments

d

a dissimilarity structure as produced by the function dist

k.w.min

minimum number of clusters for the Ward's minimum variance method. By default is set equal 2

k.w.max

maximum number of clusters for the Ward's minimum variance method (see details)

k.c.max

maximum number of clusters for the Complete-linkage method. It can not be equal or greater than the number of elements to cluster (see details)

out

logical. If TRUE (default) outliers must be searched (see details)

Details

See cited document for more details.

Value

A list of objects describing characteristics of the partitioning as follows:

Optimal.cluster

number of clusters

Cluster.list

a list of clusters; each element of this lists contains the indices of the elemenents belonging to the cluster

Silhouette

the average silhouette witdh over all the clusters

n.total

total number of input elements

n.clustered

number of input elements that have actually been clustered

Author(s)

Paola Tellaroli, paola.tellaroli@unipd.it; Michele Donato, michele.donato@wayne.edu

References

Tellaroli, P., Bazzi, M., Donato, M., Brazzale, A. R., Draghici, S. (2016) Cross Clustering: a partial clustering algorithm with automatic estimation of the number of clusters. PLOS One (In Press)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
### Generate simulated data
toy <- matrix(NA, nrow=10, ncol=7)
colnames(toy) <- paste("Sample", 1:ncol(toy), sep="")
rownames(toy) <- paste("Gene", 1:nrow(toy), sep="")
set.seed(123)
toy[,1:2] <- rnorm(n=nrow(toy)*2, mean=10, sd=0.1)
toy[,3:4] <- rnorm(n=nrow(toy)*2, mean=20, sd=0.1)
toy[,5:6] <- rnorm(n=nrow(toy)*2, mean=5, sd=0.1)
toy[,7] <- runif(n=nrow(toy), min=0, max=1)

### toy is transposed as we want to cluster samples (columns of the original matrix)
d <- dist(t(toy), method="euclidean")

### Run CrossClustering
toyres <- CrossClustering(d, k.w.min=2, k.w.max=5, k.c.max=6, out=TRUE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.