optpart: Optimal Partitioning of Dissimilarity/Distance Matrices In optpart: Optimal Partitioning of Similarity Relations

Description

Optimal partitioning is an iterative re-allocation algorithm to maximize the ratio of within-cluster similarity/among-cluster similarity for a given number of clusters. Optpart can operate as either a crisp (classical) partitioning, or a fuzzy partitioning algorithm.

Usage

 `1` ```optpart(x, dist, maxitr = 100, mininc = 0.001, maxdmu = 1) ```

Arguments

 `x` an integer, integer vector, factor vector, or objects of class ‘clustering’, ‘partana’, ‘partition’ or ‘stride’ `dist` a object of class ‘dist’ from `dist`, `dsvdis`, or `vegdist` `maxitr` the maximum number of iterations to perform `mininc` the minimum increment in the within/among similarity ratio to continue iterating `maxdmu` the ‘maximum delta mu’. If 1, a crisp (non-fuzzy) partition results. If (0,1) a fuzzy partition results.

Details

optpart produces a partition, or clustering, of items into clusters by iterative reallocation of items to clusters so as to maximize the within cluster/ among cluster similarity ratio. At each iteration optpart ranks all possible re-allocations of a sample from one cluster to another. The re-allocation that maximizes the change in the within-cluster/among-cluster ratio is performed. The next best reallocation is considered, and if it does not include any clusters already modified, it is also performed, as re-allocations of independent clusters are independent and additive in effect. When no further re-allocations can be performed in that iteration, the algorithm recalculates all possible re-allocations and iterates again. When no re-allocations exist that improve the within/among ratio greater than ‘mininc’, or the maximum number of iterations is reached, the algorithm stops.

optpart is designed to run from a random start or the levels of a factor, or preferably from existing initial partitions. Specifying a single integer gives the number of clusters desired using a random start. Specifying an integer vector gives the initial assignments of items to clusters. Initial assignments can also be extracted from a number of objects. Specific methods exist for objects of class ‘clustering’ from functions `slice` or `archi`, class ‘partana’ from function `partana`, class ‘stride’ from `stride`, or class ‘partition’ from functions `pam` or `diana`. optpart is deterministic from a given initial condition. To get good results from a random start, multiple random starts should be attempted, using function `bestopt`.

Optpart is an unweighted algorithm, i.e. each of the (n^2-n)/2 pairwise distances or dissimilarities is included in the calculation of the ratio exactly once. Optpart somewhat penalizes small clusters, as small clusters contribute only (n_i^2-n_i)/2 values to the numerator; the extreme case is that a cluster with a single member does not contribute anything to the numerator.

It is an interesting characteristic of optpart that no minimum cluster size is enforced, and it is common for partitions of a large number of clusters to contain null clusters, i.e. clusters with no members. This is not a bug or error, but rather an indication that a partition with a fewer number of clusters achieves a better within/among similarity ratio than does a larger number. It is also somewhat common that for solutions with a small or intermediate number of clusters, optpart places outliers in a small ‘trash’ cluster.

When optpart is run as a fuzzy partitioning algorithm, it often achieves a surprisingly low entropy, with many items assigned completely to a single cluster.

Value

an object of class `partana`, a list with elements:

 `ptc` a matrix of item mean similarity to each cluster `ctc` a matrix of mean cluster-to-cluster similarity `musubx` a matrix of membership of each item to each cluster. If `maxdmu` is 1, this will be a single 1 in the appropriate cluster and 0 in all others. If `maxdmu` is (0,1) then the musubx represent fuzzy memberships in each cluster. `clustering` a vector giving the cluster each item is assigned to. If optpart is run as a fuzzy partitioning, this is determined by the maximum membership observed. `ratio` the vector of within/among similarities achieved at each iteration. The final non-zero value is the final ratio achieved. `numitr` the number of iterations performed `names` the names of the items clustered

Author(s)

David W. Roberts droberts@montana.edu http://ecology.msu.montana.edu/labdsv/R

`partana`

Examples

 ```1 2 3 4``` ```data(shoshveg) dis.bc <- dsvdis(shoshveg,'bray/curtis') opt.5 <- optpart(5,dis.bc) summary(opt.5) ```

Example output

```Loading required package: cluster
This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.

Attaching package: 'labdsv'

The following object is masked from 'package:stats':

density

Attaching package: 'optpart'

The following object is masked from 'package:labdsv':

clustify

Number of clusters =  5

1   2   3   4   5
111   7   3   1  28

[,1]       [,2]       [,3]        [,4]        [,5]
[1,] 0.24108138 0.04131058 0.01567215 0.018271397 0.070028536
[2,] 0.04131058 0.12578378 0.02065326 0.035825098 0.035951111
[3,] 0.01567215 0.02065326 0.35148290 0.000000000 0.098675901
[4,] 0.01827140 0.03582510 0.00000000 0.000000000 0.004538691
[5,] 0.07002854 0.03595111 0.09867590 0.004538691 0.249659368

Ratio of Within-cluster similarity/Among-cluster similarity =  4.123 in 49 iterations
```

optpart documentation built on March 26, 2020, 6:18 p.m.