Description Usage Arguments Details Value Author(s) See Also Examples
Optimal partitioning is an iterative re-allocation algorithm to maximize the ratio of within-cluster similarity/among-cluster similarity for a given number of clusters. Optpart can operate as either a crisp (classical) partitioning, or a fuzzy partitioning algorithm.
1 |
x |
an integer, integer vector, factor vector, or objects of class ‘clustering’, ‘partana’, ‘partition’ or ‘stride’ |
dist |
a object of class ‘dist’ from |
maxitr |
the maximum number of iterations to perform |
mininc |
the minimum increment in the within/among similarity ratio to continue iterating |
maxdmu |
the ‘maximum delta mu’. If 1, a crisp (non-fuzzy) partition results. If (0,1) a fuzzy partition results. |
optpart produces a partition, or clustering, of items into clusters by iterative reallocation of items to clusters so as to maximize the within cluster/ among cluster similarity ratio. At each iteration optpart ranks all possible re-allocations of a sample from one cluster to another. The re-allocation that maximizes the change in the within-cluster/among-cluster ratio is performed. The next best reallocation is considered, and if it does not include any clusters already modified, it is also performed, as re-allocations of independent clusters are independent and additive in effect. When no further re-allocations can be performed in that iteration, the algorithm recalculates all possible re-allocations and iterates again. When no re-allocations exist that improve the within/among ratio greater than ‘mininc’, or the maximum number of iterations is reached, the algorithm stops.
optpart is designed to run from a random start or the levels of a factor, or
preferably from existing initial partitions. Specifying a single integer gives the number of clusters
desired using a random start. Specifying an integer vector gives the initial assignments of
items to clusters. Initial assignments can also be extracted from a number of objects.
Specific
methods exist for objects of class ‘clustering’ from functions
slice
or archi
, class ‘partana’ from function
partana
, class ‘stride’ from stride
, or
class ‘partition’ from functions
pam
or diana
. optpart is deterministic from a
given initial condition. To get good results from a random start, multiple
random starts should be attempted, using function bestopt
.
Optpart is an unweighted algorithm, i.e. each of the (n^2-n)/2 pairwise distances or dissimilarities is included in the calculation of the ratio exactly once. Optpart somewhat penalizes small clusters, as small clusters contribute only (n_i^2-n_i)/2 values to the numerator; the extreme case is that a cluster with a single member does not contribute anything to the numerator.
It is an interesting characteristic of optpart that no minimum cluster size is enforced, and it is common for partitions of a large number of clusters to contain null clusters, i.e. clusters with no members. This is not a bug or error, but rather an indication that a partition with a fewer number of clusters achieves a better within/among similarity ratio than does a larger number. It is also somewhat common that for solutions with a small or intermediate number of clusters, optpart places outliers in a small ‘trash’ cluster.
When optpart is run as a fuzzy partitioning algorithm, it often achieves a surprisingly low entropy, with many items assigned completely to a single cluster.
an object of class partana
, a list with elements:
ptc |
a matrix of item mean similarity to each cluster |
ctc |
a matrix of mean cluster-to-cluster similarity |
musubx |
a matrix of membership of each item to each cluster. If |
clustering |
a vector giving the cluster each item is assigned to. If optpart is run as a fuzzy partitioning, this is determined by the maximum membership observed. |
ratio |
the vector of within/among similarities achieved at each iteration. The final non-zero value is the final ratio achieved. |
numitr |
the number of iterations performed |
names |
the names of the items clustered |
David W. Roberts droberts@montana.edu http://ecology.msu.montana.edu/labdsv/R
1 2 3 4 |
Loading required package: cluster
Loading required package: labdsv
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-28. For overview type 'help("mgcv-package")'.
Loading required package: MASS
Attaching package: 'labdsv'
The following object is masked from 'package:stats':
density
Loading required package: plotrix
Attaching package: 'optpart'
The following object is masked from 'package:labdsv':
clustify
Number of clusters = 5
1 2 3 4 5
111 7 3 1 28
[,1] [,2] [,3] [,4] [,5]
[1,] 0.24108138 0.04131058 0.01567215 0.018271397 0.070028536
[2,] 0.04131058 0.12578378 0.02065326 0.035825098 0.035951111
[3,] 0.01567215 0.02065326 0.35148290 0.000000000 0.098675901
[4,] 0.01827140 0.03582510 0.00000000 0.000000000 0.004538691
[5,] 0.07002854 0.03595111 0.09867590 0.004538691 0.249659368
Ratio of Within-cluster similarity/Among-cluster similarity = 4.123 in 49 iterations
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.