| pamc1d | R Documentation |
Performs constrained partitioning around medoids (PAM) clustering on univariate data. This is a modification of the constrained k-means algorithm by Bradley, Bennett and Demiriz (2000), adapted to use medians instead of means as cluster centers, and to enforce a minimum cluster size constraint on unique values rather than all cases. This prevents tied observations from being split across separate clusters.
pamc1d(y, k, minsize = 4, countwhat = "unique",
stand = TRUE, maxit = 100, verbose = TRUE)
y |
a numeric vector of univariate data to be clustered.
Missing values ( |
k |
integer specifying the desired number of clusters. |
minsize |
integer giving the minimum number of members required
in each cluster. When |
countwhat |
character string specifying whether the minimum
cluster size constraint applies to |
stand |
logical. If |
maxit |
integer giving the maximum number of iterations in the
reassignment loop. Defaults to |
verbose |
logical. If |
The algorithm starts from an unconstrained PAM solution obtained via
pam. If any cluster fails to meet the
minsize constraint, a constrained reassignment loop is
entered. At each iteration, a transportation linear program is solved
(via lp.transport) to reassign cases to
clusters in a way that minimizes the total L1 distance to the cluster
medians while satisfying the size constraints. Cluster medians are
then recomputed, and the procedure repeats until no assignment
changes or maxit iterations are reached.
When countwhat = "unique", only the unique values are passed
to the linear program, and duplicate cases follow the cluster
assignment of the unique value they match. This ensures that tied
observations are never separated into different clusters.
The objective function is the mean absolute deviation of all cases from their assigned cluster median.
A list with the following components:
iter |
integer, the number of iterations performed. |
converged |
logical, |
clustering |
integer vector of length |
obj |
numeric, the value of the objective function (mean absolute deviation from cluster medians) at the final iteration. |
centers |
numeric vector of length |
clustable |
a table giving the number of members in each cluster. |
Rousseeuw P.J.
Bradley P.S., Bennett K.P., Demiriz A. (2000). Constrained k-means clustering. Microsoft Research Technical Report, MSR-TR-2000-65.
pam, lp.transport
set.seed(42)
y <- c(rnorm(30, mean = 0), rnorm(30, mean = 5), rnorm(30, mean = 10))
# Basic usage with k = 3 clusters and minimum cluster size of 4
result <- pamc1d(y, k = 3, minsize = 4, verbose = FALSE)
# Inspect the results
result$converged
result$clustable
result$centers
# Apply the constraint to all cases (not just unique values)
result2 <- pamc1d(y, k = 3, minsize = 4, countwhat = "any",
verbose = FALSE)
# Example with tied values
y_tied <- c(rep(1, 10), rep(3, 10), rep(5, 10),
rep(7, 10), rep(9, 10))
result3 <- pamc1d(y_tied, k = 2, minsize = 2,
countwhat = "unique", verbose = TRUE)
result3$clustable
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.