MiniBatchKmeans: A randomized dataset sub-sample algorithm that approximates...

Description Usage Arguments Value Author(s) Examples

View source: R/clusternor.R

Description

A randomized dataset sub-sample algorithm that approximates the k-means algorithm. See: https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Usage

1
2
3
4
5
MiniBatchKmeans(data, centers, nrow = -1, ncol = -1,
  batch.size = 100, iter.max = .Machine$integer.max, nthread = -1,
  init = c("kmeanspp", "random", "forgy", "none"), tolerance = 0.01,
  dist.type = c("sqeucl", "eucl", "cos", "taxi"),
  max.no.improvement = 3)

Arguments

data

Data file name on disk (NUMA optimized) or In memory data matrix

centers

Either (i) The number of centers (i.e., k), or (ii) an In-memory data matrix, or (iii) A 2-Element list with element 1 being a filename for precomputed centers, and element 2 the number of centroids.

nrow

The number of samples in the dataset

ncol

The number of features in the dataset

batch.size

Size of the mini batches

iter.max

The maximum number of iteration of k-means to perform

nthread

The number of parallel threads to run

init

The type of initialization to use c("kmeanspp", "random", "forgy", "none")

tolerance

The convergence tolerance

dist.type

What dissimilarity metric to use

max.no.improvement

Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia

Value

A list containing the attributes of the output. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. iter: The number of (outer) iterations.

Author(s)

Disa Mhembere <[email protected]>

Examples

1
2
3
iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
kms <- MiniBatchKmeans(iris.mat, k, batch.size=5)

clusternor documentation built on May 2, 2019, 11:36 a.m.