leaders: Clustering of Units with Adapted Leaders Method for Different...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Compute clustering with adapted leaders method, adapted stands for different error functions that are computed between units and current leaders

Usage

1
leaders(x, centers, err.measure = "d4", as.probs = FALSE, iter.max = 10, method = "rnd_cls", stabil = 1e-06, penalty = 1e+06, echo = FALSE)

Arguments

x

Dataframe or matrix of units (in rows). Rows can represent probabilities or frequencies.

centers

Either matrix of initial leaders for the function or number of clusters that a user would like as a result. In the second case initial leaders are determined within the algorithm (randomly selected units). There should be at least 1 and at most n centers (n meaning the number of units).

err.measure

String for error function that will be used to calculate the error between current leader and a unit. Possibilities are from d1 to d7 (see reference).

as.probs

TRUE if rows represent probabilities and FALSE otherwise.

iter.max

Maximal number of iterations of the algorithm (because leaders algorithm does local optimization). If centers represents a matrix, the argument is ignored.

method

how to compute initial clusters: rnd_cls computes random clusters for each unit and gets their leaders, units gets leaders from units

stabil

Stability parameter - used in order to achieve convergence if too many clustering iterations.

penalty

Parameter to specify how should division with zero be treated.

echo

if TRUE function prints error of each iteration

Details

Function iterates two steps of computation: (1) computation of the new leaders of clusters (according to error function) and (2) recomputation of new clusters (for each unit decide the cluster according to error function). The iteration is stopped when clustering of the last two steps is the same or when stability parameter is larger than the difference between clustering errors of the last two steps (iterations).

Value

The result is a list of

clustering

Vector of units (its names or consecutive numbers) with cluster numbers.

error

Clustering error: sum of errors for each cluster which gives an idea of the distance to optimal cluster. Computed according to error function.

leaders

Final leaders of clusters. Note: cluster leaders can be computed from units in the cluster.

Author(s)

Kejzar, N., Korenjak-Cerne, S. and Batagelj, V.

References

N. Kejzar, S. Korenjak-Cerne, and V. Batagelj: Clustering of distributions : A case of patent citations.J. classif., 2011, 28, doi: 10.1007/s00357-011-9084-x.

See Also

hierarch,kmeans from standard package, pam from package cluster

Examples

1
2
3
4
5
6
7
8
9
data(patents)
## optional removal of rows with zeros if using error measure d7
# ind <- rowSums(patents == 0) == 0
# patents <- patents[ind, ]
centers <- 3
tt <- system.time(clust <- leaders(patents[1:40,], centers=centers))
# prints out current iteration and current clustering error
print(tt) # should be done in about 10 sec
print(clust) # print result

clustDDist documentation built on May 2, 2019, 6:47 p.m.