Description Usage Arguments Value Author(s) Examples
A randomized dataset sub-sample algorithm that approximates the k-means algorithm. See: https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
1 2 3 4 5 6 7 8 9 10 11 12 13 |
data |
Data file name on disk (NUMA optimized) or In memory data matrix |
centers |
Either (i) The number of centers (i.e., k), or (ii) an In-memory data matrix, or (iii) A 2-Element list with element 1 being a filename for precomputed centers, and element 2 the number of centroids. |
nrow |
The number of samples in the dataset |
ncol |
The number of features in the dataset |
batch.size |
Size of the mini batches |
iter.max |
The maximum number of iteration of k-means to perform |
nthread |
The number of parallel threads to run |
init |
The type of initialization to use c("kmeanspp", "random", "forgy", "none") |
tolerance |
The convergence tolerance |
dist.type |
What dissimilarity metric to use |
max.no.improvement |
Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia |
A list containing the attributes of the output. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. iter: The number of (outer) iterations.
Disa Mhembere <disa@cs.jhu.edu>
1 2 3 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.