Description Usage Arguments Details Value Predefined Families Group Constraints Author(s) References See Also Examples

Perform k-centroids clustering on a data matrix.

1 2 3 4 5 6 7 |

`x` |
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |

`k` |
Either the number of clusters, or a vector of cluster
assignments, or a matrix of initial
(distinct) cluster centroids. If a number, a random set of (distinct)
rows in |

`family` |
Object of class |

`weights` |
An optional vector of weights to be used in the clustering process, cannot be combined with all families. |

`group` |
An optional grouping vector for the data, see details below. |

`control` |
An object of class |

`simple` |
Return an object of class |

`save.data` |
Save a copy of |

`which` |
One of |

`name` |
Optional long name for family, used only for show methods. |

`dist` |
A function for distance computation, ignored
if |

`cent` |
A function for centroid computation, ignored
if |

`preproc` |
Function for data preprocessing. |

`trim` |
A number in between 0 and 0.5, if non-zero then trimmed
means are used for the |

`groupFun` |
Function or name of function to obtain clusters for grouped data, see details below. |

`object` |
Object of class |

See the paper *A Toolbox for K-Centroids Cluster Analysis*
referenced below for details.

Function `kcca`

returns objects of class `"kcca"`

or
`"kccasimple"`

depending on the value of argument
`simple`

. The simpler objects contain fewer slots and hence are
faster to compute, but contain no auxiliary information used by the
plotting methods. Most plot methods for `"kccasimple"`

objects do
nothing and return a warning. If only centroids, cluster membership or
prediction for new data are of interest, then the simple objects are
sufficient.

Function `kccaFamily()`

currently has the following predefined
families (distance / centroid):

- kmeans:
Euclidean distance / mean

- kmedians:
Manhattan distance / median

- angle:
angle between observation and centroid / standardized mean

- jaccard:
Jaccard distance / numeric optimization

- ejaccard:
Jaccard distance / mean

See Leisch (2006) for details on all combinations.

If `group`

is not `NULL`

, then observations from the same
group are restricted to belong to the same cluster (must-link
constraint) or different clusters (cannot-link constraint) during the
fitting process. If `groupFun = "minSumClusters"`

, then all group
members are
assign to the cluster where the center has minimal average distance to
the group members. If `groupFun = "majorityClusters"`

, then all
group members are assigned to the cluster the majority would belong to
without a constraint.

`groupFun = "differentClusters"`

implements a cannot-link
constraint, i.e., members of one group are not allowed to belong to
the same cluster. The optimal allocation for each group is found by
solving a linear sum assignment problem using
`solve_LSAP`

. Obviously the group sizes must be smaller
than the number of clusters in this case.

Ties are broken at random in all cases.
Note that at the moment not all methods for fitted
`"kcca"`

objects respect the grouping information, most
importantly the plot method when a data argument is specified.

Friedrich Leisch

Friedrich Leisch. A Toolbox for K-Centroids Cluster Analysis. Computational Statistics and Data Analysis, 51 (2), 526–544, 2006.

Friedrich Leisch and Bettina Gruen. Extending standard cluster algorithms to allow for group constraints. In Alfredo Rizzi and Maurizio Vichi, editors, Compstat 2006-Proceedings in Computational Statistics, pages 885-892. Physica Verlag, Heidelberg, Germany, 2006.

`stepFlexclust`

, `cclust`

,
`distances`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ```
data("Nclus")
plot(Nclus)
## try kmeans
cl1 <- kcca(Nclus, k=4)
cl1
image(cl1)
points(Nclus)
## A barplot of the centroids
barplot(cl1)
## now use k-medians and kmeans++ initialization, cluster centroids
## should be similar...
cl2 <- kcca(Nclus, k=4, family=kccaFamily("kmedians"),
control=list(initcent="kmeanspp"))
cl2
## ... but the boundaries of the partitions have a different shape
image(cl2)
points(Nclus)
``` |

```
Loading required package: grid
Loading required package: lattice
Loading required package: modeltools
Loading required package: stats4
kcca object of family 'kmeans'
call:
kcca(x = Nclus, k = 4)
cluster sizes:
1 2 3 4
105 200 98 147
kcca object of family 'kmedians'
call:
kcca(x = Nclus, k = 4, family = kccaFamily("kmedians"), control = list(initcent = "kmeanspp"))
cluster sizes:
1 2 3 4
98 147 200 105
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.