piv_KMeans: k-means Clustering Using Pivotal Algorithms For Seeding

piv_KMeansR Documentation

k-means Clustering Using Pivotal Algorithms For Seeding

Description

Perform classical k-means clustering on a data matrix using pivots as initial centers.

Usage

piv_KMeans(
  x,
  centers,
  alg.type = c("kmeans", "hclust"),
  method = "average",
  piv.criterion = c("MUS", "maxsumint", "minsumnoint", "maxsumdiff"),
  H = 1000,
  iter.max = 10,
  nstart = 10,
  prec_par = 10
)

Arguments

x

A N \times D data matrix, or an object that can be coerced to such a matrix (such as a numeric vector or a dataframe with all numeric columns).

centers

The number of groups for the the k-means solution.

alg.type

The clustering algorithm for the initial partition of the N units into the desired number of clusters. Possible choices are "kmeans" (default) and "hclust".

method

If alg.type is "hclust", the character string defining the clustering method. The methods implemented are "single", "complete", "average", "ward.D", "ward.D2", "mcquitty", "median", "centroid". The default is "average".

piv.criterion

The pivotal criterion used for identifying one pivot for each group. Possible choices are: "MUS", "maxsumint", "minsumnoint", "maxsumdiff". If centers <= 4, the default method is "MUS"; otherwise, the default method is "maxsumint" (see the details and the vignette).

H

The number of distinct k-means runs used for building the N \times N co-association matrix. Default is 10^3.

iter.max

If alg.type is "kmeans", the maximum number of iterations to be passed to kmeans(). Default is 10.

nstart

If alg.type is "kmeans", the number of different starting random seeds to be passed to kmeans(). Default is 10.

prec_par

If piv.criterion is "MUS", the maximum number of competing pivots in each group. If groups' sizes are less than the default value, which is 10, then it is set equal to the cardinality of the smallest group in the initial partition.

Details

The function implements a modified version of k-means which aims at improving the clustering solution starting from a careful seeding. In particular, it performs a pivot-based initialization step using pivotal methods to find the initial centers for the clustering procedure. The starting point consists of multiple runs of the classical k-means by selecting nstart>1 in the kmeans function, with a fixed number of clusters in order to build the co-association matrix of data units.

Value

A list with components

cluster

A vector of integers indicating the cluster to which each point is allocated.

centers

A matrix of cluster centers (centroids).

coass

The co-association matrix built from ensemble clustering.

pivots

The pivotal units identified by the selected pivotal criterion.

totss

The total sum of squares.

withinss

The within-cluster sum of squares for each cluster.

tot.withinss

The within-cluster sum of squares summed across clusters.

betwennss

The between-cluster sum of squared distances.

size

The number of points in each cluster.

iter

The number of (outer) iterations.

ifault

integer: indicator of a possible algorithm problem (for experts).

Author(s)

Leonardo Egidi legidi@units.it, Roberta Pappada

References

Egidi, L., Pappadà, R., Pauli, F., Torelli, N. (2018). K-means seeding via MUS algorithm. Conference Paper, Book of Short Papers, SIS2018, ISBN: 9788891910233.

Examples


# Data generated from a mixture of three bivariate Gaussian distributions

## Not run: 
N  <- 620
k  <- 3
n1 <- 20
n2 <- 100
n3 <- 500
x  <- matrix(NA, N,2)
truegroup <- c( rep(1,n1), rep(2, n2), rep(3, n3))

 x[1:n1,] <- rmvnorm(n1, c(1,5), sigma=diag(2))
 x[(n1+1):(n1+n2),] <- rmvnorm(n2, c(4,0), sigma=diag(2))
 x[(n1+n2+1):(n1+n2+n3),] <- rmvnorm(n3, c(6,6), sigma=diag(2))

# Apply piv_KMeans with MUS as pivotal criterion

res <- piv_KMeans(x, k)

# Apply piv_KMeans with maxsumdiff as pivotal criterion

res2 <- piv_KMeans(x, k, piv.criterion ="maxsumdiff")

# Plot the data and the clustering solution

par(mfrow=c(1,2), pty="s")
colors_cluster <- c("grey", "darkolivegreen3", "coral")
colors_centers <- c("black", "darkgreen", "firebrick")
graphics::plot(x, col = colors_cluster[truegroup],
   bg= colors_cluster[truegroup], pch=21, xlab="x[,1]",
   ylab="x[,2]", cex.lab=1.5,
   main="True data", cex.main=1.5)

graphics::plot(x, col = colors_cluster[res$cluster],
   bg=colors_cluster[res$cluster], pch=21, xlab="x[,1]",
   ylab="x[,2]", cex.lab=1.5,
   main="piv_KMeans", cex.main=1.5)
points(x[res$pivots, 1], x[res$pivots, 2],
      pch=24, col=colors_centers,bg=colors_centers,
      cex=1.5)
points(res$centers, col = colors_centers[1:k],
   pch = 8, cex = 2)

## End(Not run)


pivmet documentation built on March 7, 2023, 6:34 p.m.