Description Usage Arguments Details Value Author(s) References See Also Examples

Performs Cross-Entropy Clustering on a data matrix.

1 2 3 4 5 | ```
cec(x, centers, type = c("covariance", "fixedr", "spherical", "diagonal",
"eigenvalues", "mean", "all"), iter.max = 25, nstart = 1, param,
centers.init = c("kmeans++", "random"), card.min = "5%", keep.removed = F,
interactive = F, threads = 1, split = F, split.depth = 8, split.tries = 5,
split.limit = 100, split.initial.starts = 1,readline = T)
``` |

`x` |
Numeric matrix of data. |

`centers` |
Either a matrix of initial centers or the number of initial centers ( If If |

`type` |
Type (or types) of clustering (density family). This can be either a single value or a vector of length equal to the number of centers. Possible values are: "covariance", "fixedr", "spherical", "diagonal", "eigenvalues", "all" (default). Currently, if the |

`iter.max` |
Maximum number of iterations at each clustering. |

`nstart` |
The number of clusterings to perform (with different initial centers). Only the best
clustering (with the lowest cost) will be returned. Value grater then one is valid
only if the If the If the split mode is on ( |

`centers.init` |
Centers initialization method. Possible values are: "kmeans++" (default), "random". |

`param` |
Parameter (or parameters) specific to a particular type of clustering. Not all types of clustering require parameter. Types that require parameter: "covariance" (matrix parameter), "fixedr" (numeric parameter), "eigenvalues" (vector parameter). This can be a vector or a list (when one of the parameters is a matrix or a vector). |

`card.min` |
Minimal cluster cardinality. If cluster cardinality becomes less than card.min, cluster is removed. This argument can be either an integer number or a string ended with a percent sign (e.g. "5%"). |

`keep.removed` |
If this parameter is TRUE, removed clusters will be visible in the results as NA in centers matrix (as well as corresponding values in the list of covariances). |

`interactive` |
Interactive mode. If TRUE, the result of clustering will be plotted after every iteration. |

`threads` |
Specifies the number of threads to use or "auto" to use default number of threads (usually
the number of available processing units/cores) when performing multiple starts ( The execution of a single start is always performed by a single thread, thus for |

`split` |
Enables split mode. This mode discovers new clusters after initial clustering, by trying to split single clusters into two to lower the cost function. For each start ( |

`split.depth` |
Cluster subdivision depth used in split mode. Usually a value less than 10 is sufficient (when after each subdivision,
new clusters have similar sizes). For some data, subdivisions may often produce a cluster (one of the two) that will
not be split further, in that case a higher value of the |

`split.tries` |
The number of attempts that are made when trying to split a cluster in split mode. |

`split.limit` |
Maximum number of centers to be discovered in split mode. |

`split.initial.starts` |
The number of 'standard' starts performed before starting split. |

`readline` |
Used only in the interactive mode. If |

In the context of implementation, Cross-Entropy Clustering (CEC) aims to partition *m* points into *k*
clusters so as to minimize the cost function (energy * E* of the clustering) by switching the points between clusters.
The presented method is based on the adapted Hartigan approach, where we reduce clusters which cardinalities decreased below some small prefixed level.

The energy function * E* is given by:

*
E(Y1, F1; ...; Yk, Fk) = ∑(p(Yi) * (-ln(p(Yi)) + H(Yi | Fi)))*

where *Yi* denotes the *i*-th cluster, *p(Yi)* is the ratio of the number of points in *i*-th cluster to the total number points, * H(Yi|Fi)* is the value of cross-entropy, which represents the internal cluster energy function of data

The value of the internal energy function * H* depends on the covariance matrix (computed using maximum-likelihood method) and the mean (in case of the

"all" - All Gaussian densities. Data will form ellipsoids with arbitrary radiuses.

"covariance" - Gaussian densities with a fixed given covariance. The shapes of clusters depend on the given covariance matrix (additional parameter).

"fixedr" - Special case of "covariance", where the covariance matrix equals

*rI*for the given*r*(additional parameter). The clustering will have a tendency to divide data into balls with approximate radius proportional to the square root of*r*."spherical" - Spherical (radial) Gaussian densities (covariance proportional to the identity). Clusters will have a tendency to form balls of arbitrary sizes.

"diagonal" - Gaussian densities with diagonal covariane. Data will form ellipsoids with radiuses parallel to the coordinate axes.

"eigenvalues" - Gaussian densities with covariance matrix having fixed eigenvalues (additional parameter). The clustering will try to divide the data into fixed-shaped ellipsoids rotated by an arbitrary angle.

"mean" Gaussian densities with a fixed mean. Data will be covered with ellipsoids with fixed centers.

The implementation of `cec`

function allows mixing of clustering types.

Returns an object of class "cec" with available components: "data", "cluster", "probabilities", "centers", "cost.function", "nclusters", "iterations", "cost", "covariances", "covariances.model", "time".

Konrad Kamieniecki, Jacek Tabor, Przemys<c5><82>aw Spurek

Spurek, P. and Tabor, J. (2014)
Cross-Entropy Clustering
*Pattern Recognition* **47, 9** 3046–3059

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ```
#
# Cross-Entropy Clustering
#
## Example of clustering random data set of 3 Gaussians,
## 10 random initial centers and 7% as minimal cluster size.
m1 = matrix(rnorm(2000, sd=1), ncol=2)
m2 = matrix(rnorm(2000, mean = 3, sd = 1.5), ncol = 2)
m3 = matrix(rnorm(2000, mean = 3, sd = 1), ncol = 2)
m3[,2] = m3[,2] - 5
m = rbind(m1, m2, m3)
par(ask = TRUE)
plot(m, cex = 0.5, pch = 19)
## Clustering result:
Z = cec(m, 10, iter.max = 100, card.min="7%")
plot(Z)
# Result:
Z
## Example of clustering mouse-like set using spherical Gaussian densities.
m = mouseset(n=7000, r.head=2, r.left.ear=1.1, r.right.ear=1.1, left.ear.dist=2.5,
right.ear.dist=2.5, dim=2)
plot(m, cex = 0.5, pch = 19)
## Clustering result:
Z = cec(m, 3, type="sp", iter.max = 100, nstart=4, card.min="5%")
plot(Z)
# Result:
Z
## Example of clustering data set "Tset" using "eigenvalues" clustering type.
data(Tset)
plot(Tset, cex = 0.5, pch = 19)
centers = init.centers(Tset, 2)
## Clustering result:
Z <- cec(Tset, 5, "eigenvalues", param=c(0.02,0.002), nstart=4)
plot(Z)
# Result:
Z
## Example of using CEC split method starting with a single cluster.
data(mixShapes)
plot(mixShapes, cex = 0.5, pch = 19)
## Clustering result:
Z <- cec(mixShapes, 1, split=TRUE)
plot(Z)
# Result:
Z
``` |

```
CEC clustering result:
Probability vector:
[1] 0.3323333 0.3540000 0.3136667
Means of clusters:
[,1] [,2]
[1,] 3.09326320 -2.003513610
[2,] 0.02232949 -0.005592356
[3,] 3.08756948 3.186625149
Cost function:
[1] 4.110942
Number of clusters:
[1] 3
Number of iterations:
[1] 21
Computation time:
[1] 0.032
Available components:
[1] "data" "cluster" "probabilities"
[4] "centers" "cost.function" "nclusters"
[7] "iterations" "covariances" "covariances.model"
[10] "time"
CEC clustering result:
Probability vector:
[1] 0.1878571 0.1784286 0.6337143
Means of clusters:
[,1] [,2]
[1,] -1.821308170 1.81283673
[2,] 1.820294791 1.85661636
[3,] 0.006953714 -0.09758019
Cost function:
[1] 3.23323
Number of clusters:
[1] 3
Number of iterations:
[1] 15
Computation time:
[1] 0.098
Available components:
[1] "data" "cluster" "probabilities"
[4] "centers" "cost.function" "nclusters"
[7] "iterations" "covariances" "covariances.model"
[10] "time"
CEC clustering result:
Probability vector:
[1] 0.3646778 0.1422434 0.3536993 0.1393795
Means of clusters:
[,1] [,2]
[1,] 0.4794157 0.2081635
[2,] 0.7600415 0.9506202
[3,] 0.4807913 0.7344452
[4,] 0.2100561 0.9512146
Cost function:
[1] -0.8761754
Number of clusters:
[1] 4
Number of iterations:
[1] 18
Computation time:
[1] 0.302
Available components:
[1] "data" "cluster" "probabilities"
[4] "centers" "cost.function" "nclusters"
[7] "iterations" "covariances" "covariances.model"
[10] "time"
CEC clustering result:
Probability vector:
[1] 0.1435556 0.1427778 0.1404444 0.1453333 0.1401111 0.1450000 0.1427778
Means of clusters:
[,1] [,2]
[1,] 485.59620 168.18558
[2,] 368.08445 203.08078
[3,] 470.67809 30.09067
[4,] 79.96403 263.55175
[5,] 205.68965 399.95641
[6,] 160.00748 310.04231
[7,] 200.07333 100.05577
Cost function:
[1] 10.14958
Number of clusters:
[1] 7
Number of iterations:
[1] 3
Computation time:
[1] 0.298
Available components:
[1] "data" "cluster" "probabilities"
[4] "centers" "cost.function" "nclusters"
[7] "iterations" "covariances" "covariances.model"
[10] "time"
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.