Description Usage Arguments Value Author(s) Examples

A randomized dataset sub-sample algorithm that approximates the k-means algorithm. See: https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

1 2 3 4 5 6 7 8 9 10 11 12 13 |

`data` |
Data file name on disk (NUMA optimized) or In memory data matrix |

`centers` |
Either (i) The number of centers (i.e., k), or
(ii) an In-memory data matrix, or (iii) A 2-Element |

`nrow` |
The number of samples in the dataset |

`ncol` |
The number of features in the dataset |

`batch.size` |
Size of the mini batches |

`iter.max` |
The maximum number of iteration of k-means to perform |

`nthread` |
The number of parallel threads to run |

`init` |
The type of initialization to use c("kmeanspp", "random", "forgy", "none") |

`tolerance` |
The convergence tolerance |

`dist.type` |
What dissimilarity metric to use |

`max.no.improvement` |
Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia |

A list containing the attributes of the output.
cluster: A vector of integers (from 1:**k**) indicating the cluster to
which each point is allocated.
centers: A matrix of cluster centres.
size: The number of points in each cluster.
iter: The number of (outer) iterations.

Disa Mhembere <disa@cs.jhu.edu>

1 2 3 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.