pamc1d: Constrained k-medoids clustering for univariate data

View source: R/bixplot.R

pamc1dR Documentation

Constrained k-medoids clustering for univariate data

Description

Performs constrained partitioning around medoids (PAM) clustering on univariate data. This is a modification of the constrained k-means algorithm by Bradley, Bennett and Demiriz (2000), adapted to use medians instead of means as cluster centers, and to enforce a minimum cluster size constraint on unique values rather than all cases. This prevents tied observations from being split across separate clusters.

Usage

pamc1d(y, k, minsize = 4, countwhat = "unique",
       stand = TRUE, maxit = 100, verbose = TRUE)

Arguments

y

a numeric vector of univariate data to be clustered. Missing values (NA) are removed before clustering.

k

integer specifying the desired number of clusters.

minsize

integer giving the minimum number of members required in each cluster. When countwhat = "unique" (the default), this refers to unique values; when countwhat = "any" it refers to all cases including duplicates. Defaults to 4.

countwhat

character string specifying whether the minimum cluster size constraint applies to "unique" values (the default) or to "any" cases including duplicates.

stand

logical. If TRUE (the default), the data are standardized by subtracting the mean and dividing by the standard deviation before clustering. The returned cluster centers are on the standardized scale.

maxit

integer giving the maximum number of iterations in the reassignment loop. Defaults to 100.

verbose

logical. If TRUE (the default), intermediate results including the clustering vector, cluster table, and objective function value are printed at each iteration.

Details

The algorithm starts from an unconstrained PAM solution obtained via pam. If any cluster fails to meet the minsize constraint, a constrained reassignment loop is entered. At each iteration, a transportation linear program is solved (via lp.transport) to reassign cases to clusters in a way that minimizes the total L1 distance to the cluster medians while satisfying the size constraints. Cluster medians are then recomputed, and the procedure repeats until no assignment changes or maxit iterations are reached.

When countwhat = "unique", only the unique values are passed to the linear program, and duplicate cases follow the cluster assignment of the unique value they match. This ensures that tied observations are never separated into different clusters.

The objective function is the mean absolute deviation of all cases from their assigned cluster median.

Value

A list with the following components:

iter

integer, the number of iterations performed.

converged

logical, TRUE if the algorithm converged (no assignment changes in the final iteration), FALSE if maxit was reached before convergence.

clustering

integer vector of length n (after removing NAs) giving the cluster assignment of each observation. Observations are sorted by value.

obj

numeric, the value of the objective function (mean absolute deviation from cluster medians) at the final iteration.

centers

numeric vector of length k with the median of each cluster on the (possibly standardized) scale.

clustable

a table giving the number of members in each cluster.

Author(s)

Rousseeuw P.J.

References

Bradley P.S., Bennett K.P., Demiriz A. (2000). Constrained k-means clustering. Microsoft Research Technical Report, MSR-TR-2000-65.

See Also

pam, lp.transport

Examples

set.seed(42)
y <- c(rnorm(30, mean = 0), rnorm(30, mean = 5), rnorm(30, mean = 10))

# Basic usage with k = 3 clusters and minimum cluster size of 4
result <- pamc1d(y, k = 3, minsize = 4, verbose = FALSE)

# Inspect the results
result$converged
result$clustable
result$centers

# Apply the constraint to all cases (not just unique values)
result2 <- pamc1d(y, k = 3, minsize = 4, countwhat = "any",
                  verbose = FALSE)

# Example with tied values
y_tied <- c(rep(1, 10), rep(3, 10), rep(5, 10),
            rep(7, 10), rep(9, 10))
result3 <- pamc1d(y_tied, k = 2, minsize = 2,
                  countwhat = "unique", verbose = TRUE)
result3$clustable

classmap documentation built on April 29, 2026, 5:10 p.m.