kmeans18B: K-Means Clustering with Lightweight Coreset
In T4cluster: Tools for Cluster Analysis

kmeans18B

R Documentation

K-Means Clustering with Lightweight Coreset

Description

Apply k-means clustering algorithm on top of the lightweight coreset as proposed in the paper. The smaller the set is, the faster the execution becomes with potentially larger quantization errors.

Usage

kmeans18B(data, k = 2, m = round(nrow(data)/2), ...)

Arguments

`data`	an `(n\times p)` matrix of row-stacked observations.
`k`	the number of clusters (default: 2).
`m`	the size of coreset (default: `n/2`).
`...`	extra parameters including maxiter the maximum number of iterations (default: 10). nstart the number of random initializations (default: 5).

Value

a named list of S3 class T4cluster containing

cluster: a length-n vector of class labels (from 1:k).
mean: a (k\times p) matrix where each row is a class mean.
wcss: within-cluster sum of squares (WCSS).
algorithm: name of the algorithm.

References

\insertRef

bachem_scalable_2018T4cluster

Examples

# -------------------------------------------------------------
#            clustering with 'iris' dataset
# -------------------------------------------------------------
## PREPARE
data(iris)
X   = as.matrix(iris[,1:4])
lab = as.integer(as.factor(iris[,5]))

## EMBEDDING WITH PCA
X2d = Rdimtools::do.pca(X, ndim=2)$Y

## CLUSTERING WITH DIFFERENT CORESET SIZES WITH K=3
core1 = kmeans18B(X, k=3, m=25)$cluster
core2 = kmeans18B(X, k=3, m=50)$cluster
core3 = kmeans18B(X, k=3, m=100)$cluster

## VISUALIZATION
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,4), pty="s")
plot(X2d, col=lab, pch=19, main="true label")
plot(X2d, col=core1, pch=19, main="kmeans18B: m=25")
plot(X2d, col=core2, pch=19, main="kmeans18B: m=50")
plot(X2d, col=core3, pch=19, main="kmeans18B: m=100")
par(opar)

T4cluster documentation built on Nov. 5, 2025, 5:55 p.m.