KmeansPP: Perform the k-means++ clustering algorithm on a data matrix.

Description Usage Arguments Value Author(s) Examples

View source: R/clusternor.R

Description

A parallel and scalable implementation of the algorithm described in Ostrovsky, Rafail, et al. "The effectiveness of Lloyd-type methods for the k-means problem." Journal of the ACM (JACM) 59.6 (2012): 28.

Usage

1
2
3
4
5
6
7
8
9
KmeansPP(
  data,
  centers,
  nrow = -1,
  ncol = -1,
  nstart = 1,
  nthread = -1,
  dist.type = c("sqeucl", "eucl", "cos", "taxi")
)

Arguments

data

Data file name on disk (NUMA optimized) or In memory data matrix

centers

The number of centers (i.e., k)

nrow

The number of samples in the dataset

ncol

The number of features in the dataset

nstart

The number of iterations of kmeans++ to run

nthread

The number of parallel threads to run

dist.type

What dissimilarity metric to use c("taxi", "eucl", "cos")

Value

A list containing the attributes of the output. cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated. centers: A matrix of cluster centres. size: The number of points in each cluster. energy: The sum of distances for each sample from it's closest cluster. best.start: The sum of distances for each sample from it's closest cluster.

Author(s)

Disa Mhembere <disa@cs.jhu.edu>

Examples

1
2
3
4
iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
nstart <- 3
km <- KmeansPP(iris.mat, k, nstart=nstart)

clusternor documentation built on March 26, 2020, 7:31 p.m.