leaders: Clustering of Units with Adapted Leaders Method for Different...
In clustDDist: Clustering Discrete Distributions

Description Usage Arguments Details Value Author(s) References See Also Examples

Compute clustering with adapted leaders method, adapted stands for different error functions that are computed between units and current leaders

1	leaders(x, centers, err.measure = "d4", as.probs = FALSE, iter.max = 10, method = "rnd_cls", stabil = 1e-06, penalty = 1e+06, echo = FALSE)

`x`	Dataframe or matrix of units (in rows). Rows can represent probabilities or frequencies.
`centers`	Either matrix of initial leaders for the function or number of clusters that a user would like as a result. In the second case initial leaders are determined within the algorithm (randomly selected units). There should be at least 1 and at most n centers (n meaning the number of units).
`err.measure`	String for error function that will be used to calculate the error between current leader and a unit. Possibilities are from d1 to d7 (see reference).
`as.probs`	TRUE if rows represent probabilities and FALSE otherwise.
`iter.max`	Maximal number of iterations of the algorithm (because leaders algorithm does local optimization). If centers represents a matrix, the argument is ignored.
`method`	how to compute initial clusters: `rnd_cls` computes random clusters for each unit and gets their leaders, `units` gets leaders from units
`stabil`	Stability parameter - used in order to achieve convergence if too many clustering iterations.
`penalty`	Parameter to specify how should division with zero be treated.
`echo`	if TRUE function prints error of each iteration

Function iterates two steps of computation: (1) computation of the new leaders of clusters (according to error function) and (2) recomputation of new clusters (for each unit decide the cluster according to error function). The iteration is stopped when clustering of the last two steps is the same or when stability parameter is larger than the difference between clustering errors of the last two steps (iterations).

The result is a list of

`clustering`	Vector of units (its names or consecutive numbers) with cluster numbers.
`error`	Clustering error: sum of errors for each cluster which gives an idea of the distance to optimal cluster. Computed according to error function.
`leaders`	Final leaders of clusters. Note: cluster leaders can be computed from units in the cluster.

Kejzar, N., Korenjak-Cerne, S. and Batagelj, V.

N. Kejzar, S. Korenjak-Cerne, and V. Batagelj: Clustering of distributions : A case of patent citations.J. classif., 2011, 28, doi: 10.1007/s00357-011-9084-x.

hierarch,kmeans from standard package, pam from package cluster

data(patents)
## optional removal of rows with zeros if using error measure d7
# ind <- rowSums(patents == 0) == 0
# patents <- patents[ind, ]
centers <- 3
tt <- system.time(clust <- leaders(patents[1:40,], centers=centers))
# prints out current iteration and current clustering error
print(tt) # should be done in about 10 sec
print(clust) # print result