Description Usage Arguments Details Value Communication References Examples
k-means via Lloyd's Algorithm.
1 | km(x, k = 2, maxiter = 100, seed = get_random_seed())
|
x |
A shaq. |
k |
The 'k' in k-means. |
maxiter |
The maximum number of iterations possible. |
seed |
A seed for determining the (random) initial centroids. Each process has to use the same seed or very strange things may happen. If you do not provide a seed, a good initial seed will be chosen. |
Note that the function does not respect set.seed()
or
comm.set.seed()
. For managing random seeds, use the seed
parameter.
The iterations stop either when the maximum number of iterations have been
achieved, or when the centers in the current iteration are basically the same
(within 1e-8
) as the centers from the previous iteration.
For best performance, the data should be as balanced as possible across all MPI ranks.
A list containing the cluster centers (global), the observation labels i.e. the assignments to clusters (distributed shaq), and the total number of iterations (global).
Most of the computation is local. However, at each iteration there is a
length n*k
and a length k
allreduce call to update the centers.
There is also a check at the beginning of the call to find out how many
observations come before the current process's data, which is an allgather
operation.
Phillips, J.. Data Mining: Algorithms, Geometry, and Probability. https://www.cs.utah.edu/~jeffp/DMBook/DM-AGP.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.