piv_KMeans | R Documentation |
Perform classical k-means clustering on a data matrix using pivots as initial centers.
piv_KMeans( x, centers, alg.type = c("kmeans", "hclust"), method = "average", piv.criterion = c("MUS", "maxsumint", "minsumnoint", "maxsumdiff"), H = 1000, iter.max = 10, nstart = 10, prec_par = 10 )
x |
A N \times D data matrix, or an object that can be coerced to such a matrix (such as a numeric vector or a dataframe with all numeric columns). |
centers |
The number of groups for the the k-means solution. |
alg.type |
The clustering algorithm for the initial partition of the
N units into the desired number of clusters.
Possible choices are |
method |
If |
piv.criterion |
The pivotal criterion used for identifying one pivot
for each group. Possible choices are: |
H |
The number of distinct k-means runs used for building the N \times N co-association matrix. Default is 10^3. |
iter.max |
If |
nstart |
If |
prec_par |
If |
The function implements a modified version of k-means which aims at
improving the clustering solution starting from a careful seeding.
In particular, it performs a pivot-based initialization step
using pivotal methods to find the initial centers
for the clustering procedure. The starting point consists of multiple
runs of the classical k-means by selecting nstart>1
in the
kmeans
function,
with a fixed number of clusters
in order to build the co-association matrix of data units.
A list with components
|
A vector of integers indicating the cluster to which each point is allocated. |
|
A matrix of cluster centers (centroids). |
|
The co-association matrix built from ensemble clustering. |
|
The pivotal units identified by the selected pivotal criterion. |
|
The total sum of squares. |
|
The within-cluster sum of squares for each cluster. |
|
The within-cluster sum of squares summed across clusters. |
|
The between-cluster sum of squared distances. |
|
The number of points in each cluster. |
|
The number of (outer) iterations. |
|
integer: indicator of a possible algorithm problem (for experts). |
Leonardo Egidi legidi@units.it, Roberta Pappada
Egidi, L., Pappadà , R., Pauli, F., Torelli, N. (2018). K-means seeding via MUS algorithm. Conference Paper, Book of Short Papers, SIS2018, ISBN: 9788891910233.
# Data generated from a mixture of three bivariate Gaussian distributions ## Not run: N <- 620 k <- 3 n1 <- 20 n2 <- 100 n3 <- 500 x <- matrix(NA, N,2) truegroup <- c( rep(1,n1), rep(2, n2), rep(3, n3)) x[1:n1,] <- rmvnorm(n1, c(1,5), sigma=diag(2)) x[(n1+1):(n1+n2),] <- rmvnorm(n2, c(4,0), sigma=diag(2)) x[(n1+n2+1):(n1+n2+n3),] <- rmvnorm(n3, c(6,6), sigma=diag(2)) # Apply piv_KMeans with MUS as pivotal criterion res <- piv_KMeans(x, k) # Apply piv_KMeans with maxsumdiff as pivotal criterion res2 <- piv_KMeans(x, k, piv.criterion ="maxsumdiff") # Plot the data and the clustering solution par(mfrow=c(1,2), pty="s") colors_cluster <- c("grey", "darkolivegreen3", "coral") colors_centers <- c("black", "darkgreen", "firebrick") graphics::plot(x, col = colors_cluster[truegroup], bg= colors_cluster[truegroup], pch=21, xlab="x[,1]", ylab="x[,2]", cex.lab=1.5, main="True data", cex.main=1.5) graphics::plot(x, col = colors_cluster[res$cluster], bg=colors_cluster[res$cluster], pch=21, xlab="x[,1]", ylab="x[,2]", cex.lab=1.5, main="piv_KMeans", cex.main=1.5) points(x[res$pivots, 1], x[res$pivots, 2], pch=24, col=colors_centers,bg=colors_centers, cex=1.5) points(res$centers, col = colors_centers[1:k], pch = 8, cex = 2) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.