pamc1d: Constrained k-medoids clustering for univariate data
In classmap: Visualizing Classification Results

pamc1d

R Documentation

Constrained k-medoids clustering for univariate data

Description

Performs constrained partitioning around medoids (PAM) clustering on univariate data. This is a modification of the constrained k-means algorithm by Bradley, Bennett and Demiriz (2000), adapted to use medians instead of means as cluster centers, and to enforce a minimum cluster size constraint on unique values rather than all cases. This prevents tied observations from being split across separate clusters.

Usage

pamc1d(y, k, minsize = 4, countwhat = "unique",
       stand = TRUE, maxit = 100, verbose = TRUE)

Arguments

`y`	a numeric vector of univariate data to be clustered. Missing values (`NA`) are removed before clustering.
`k`	integer specifying the desired number of clusters.
`minsize`	integer giving the minimum number of members required in each cluster. When `countwhat = "unique"` (the default), this refers to unique values; when `countwhat = "any"` it refers to all cases including duplicates. Defaults to `4`.
`countwhat`	character string specifying whether the minimum cluster size constraint applies to `"unique"` values (the default) or to `"any"` cases including duplicates.
`stand`	logical. If `TRUE` (the default), the data are standardized by subtracting the mean and dividing by the standard deviation before clustering. The returned cluster centers are on the standardized scale.
`maxit`	integer giving the maximum number of iterations in the reassignment loop. Defaults to `100`.
`verbose`	logical. If `TRUE` (the default), intermediate results including the clustering vector, cluster table, and objective function value are printed at each iteration.

Details

The algorithm starts from an unconstrained PAM solution obtained via pam. If any cluster fails to meet the minsize constraint, a constrained reassignment loop is entered. At each iteration, a transportation linear program is solved (via lp.transport) to reassign cases to clusters in a way that minimizes the total L1 distance to the cluster medians while satisfying the size constraints. Cluster medians are then recomputed, and the procedure repeats until no assignment changes or maxit iterations are reached.

When countwhat = "unique", only the unique values are passed to the linear program, and duplicate cases follow the cluster assignment of the unique value they match. This ensures that tied observations are never separated into different clusters.

The objective function is the mean absolute deviation of all cases from their assigned cluster median.

Value

A list with the following components:

`iter`	integer, the number of iterations performed.
`converged`	logical, `TRUE` if the algorithm converged (no assignment changes in the final iteration), `FALSE` if `maxit` was reached before convergence.
`clustering`	integer vector of length `n` (after removing `NA`s) giving the cluster assignment of each observation. Observations are sorted by value.
`obj`	numeric, the value of the objective function (mean absolute deviation from cluster medians) at the final iteration.
`centers`	numeric vector of length `k` with the median of each cluster on the (possibly standardized) scale.
`clustable`	a table giving the number of members in each cluster.

Author(s)

Rousseeuw P.J.

References

Bradley P.S., Bennett K.P., Demiriz A. (2000). Constrained k-means clustering. Microsoft Research Technical Report, MSR-TR-2000-65.

Examples

set.seed(42)
y <- c(rnorm(30, mean = 0), rnorm(30, mean = 5), rnorm(30, mean = 10))

# Basic usage with k = 3 clusters and minimum cluster size of 4
result <- pamc1d(y, k = 3, minsize = 4, verbose = FALSE)

# Inspect the results
result$converged
result$clustable
result$centers

# Apply the constraint to all cases (not just unique values)
result2 <- pamc1d(y, k = 3, minsize = 4, countwhat = "any",
                  verbose = FALSE)

# Example with tied values
y_tied <- c(rep(1, 10), rep(3, 10), rep(5, 10),
            rep(7, 10), rep(9, 10))
result3 <- pamc1d(y_tied, k = 2, minsize = 2,
                  countwhat = "unique", verbose = TRUE)
result3$clustable

classmap documentation built on April 29, 2026, 5:10 p.m.