MiniBatchKmeans: A randomized dataset sub-sample algorithm that approximates...
In clusternor: A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package

Description Usage Arguments Value Author(s) Examples

View source: R/clusternor.R

A randomized dataset sub-sample algorithm that approximates the k-means algorithm. See: https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

MiniBatchKmeans(
  data,
  centers,
  nrow = -1,
  ncol = -1,
  batch.size = 100,
  iter.max = .Machine$integer.max,
  nthread = -1,
  init = c("kmeanspp", "random", "forgy", "none"),
  tolerance = 0.01,
  dist.type = c("sqeucl", "eucl", "cos", "taxi"),
  max.no.improvement = 3
)

`data`	Data file name on disk (NUMA optimized) or In memory data matrix
`centers`	Either (i) The number of centers (i.e., k), or (ii) an In-memory data matrix, or (iii) A 2-Element list with element 1 being a filename for precomputed centers, and element 2 the number of centroids.
`nrow`	The number of samples in the dataset
`ncol`	The number of features in the dataset
`batch.size`	Size of the mini batches
`iter.max`	The maximum number of iteration of k-means to perform
`nthread`	The number of parallel threads to run
`init`	The type of initialization to use c("kmeanspp", "random", "forgy", "none")
`tolerance`	The convergence tolerance
`dist.type`	What dissimilarity metric to use
`max.no.improvement`	Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia

A list containing the attributes of the output. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. iter: The number of (outer) iterations.

Disa Mhembere <disa@cs.jhu.edu>

1
2
3

iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
kms <- MiniBatchKmeans(iris.mat, k, batch.size=5)

clusternor documentation built on March 26, 2020, 7:31 p.m.

clusternor index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clusternor
A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package

MiniBatchKmeans: A randomized dataset sub-sample algorithm that approximates...
In clusternor: A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package

Description

Usage

Arguments

Value

Author(s)

Examples

Related to MiniBatchKmeans in clusternor...

R Package Documentation

Browse R Packages

We want your feedback!

clusternor A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package

MiniBatchKmeans: A randomized dataset sub-sample algorithm that approximates... In clusternor: A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package

Description

Usage

Arguments

Value

Author(s)

Examples

Related to MiniBatchKmeans in clusternor...

R Package Documentation

Browse R Packages

We want your feedback!

clusternor
A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package

MiniBatchKmeans: A randomized dataset sub-sample algorithm that approximates...
In clusternor: A Parallel Clustering Non-Uniform Memory Access ('NUMA') Optimized Package