Description Usage Arguments Value Author(s) Examples
A parallel and scalable implementation of the algorithm described in Ostrovsky, Rafail, et al. "The effectiveness of Lloyd-type methods for the k-means problem." Journal of the ACM (JACM) 59.6 (2012): 28.
1 2 3 4 5 6 7 8 9 |
data |
Data file name on disk (NUMA optimized) or In memory data matrix |
centers |
The number of centers (i.e., k) |
nrow |
The number of samples in the dataset |
ncol |
The number of features in the dataset |
nstart |
The number of iterations of kmeans++ to run |
nthread |
The number of parallel threads to run |
dist.type |
What dissimilarity metric to use c("taxi", "eucl", "cos") |
A list containing the attributes of the output. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. energy: The sum of distances for each sample from it's closest cluster. best.start: The sum of distances for each sample from it's closest cluster.
Disa Mhembere <disa@cs.jhu.edu>
1 2 3 4 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.