kmeans: K-Means Clustering

Description Usage Arguments Value Methods (by class) Examples

Description

This function performs k-means clustering of the data points in a data set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
kmeans(x, ...)

## Default S3 method:
kmeans(x, centers, iter.max = 10L, nstart = 1L,
  algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"),
  trace = FALSE)

## S3 method for class 'sgd'
kmeans(x, k = 2, iter.max = 50L)

## S3 method for class 'gmd'
kmeans(x, k = 2, iter.max = 50L, d2 = NULL,
  method = "ward.D", rule = 2, shift = FALSE,
  avoid_mean_computation = FALSE)

Arguments

x

A numeric matrix where each row is a data point or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns), an sgd object or a gmd object.

...

not used.

centers

either the number of clusters, say k, or a set of initial (distinct) cluster centres. If a number, a random set of (distinct) rows in x is chosen as the initial centres.

iter.max

the maximum number of iterations allowed.

nstart

if centers is a number, how many random sets should be chosen?

algorithm

character: may be abbreviated. Note that "Lloyd" and "Forgy" are alternative names for one algorithm.

trace

logical or integer number, currently only used in the default method ("Hartigan-Wong"): if positive (or true), tracing information on the progress of the algorithm is produced. Higher values may produce more tracing information.

k

The number of clusters to look for (default: 2L).

method

character: may be abbreviated. "centers" causes fitted to return cluster centers (one for each input point) and "classes" causes fitted to return a vector of class assignments.

Value

An object of class "kmeans" which as a print and a fitted methods. It is a list with at least the following components:

cluster

A vector of integers (among 1:k) indicating the cluster to which each point is allocated.

centers

A matrix of cluster centres.

totss

The total sum of squares.

withinss

Vector of within-cluster sum of squares, one component per cluster.

tot.withinss

Total within-cluster sum of squares.

betweenss

The between-cluster sum of squares.

size

The number of points in each cluster.

iter

The number of (outer) iterations.

ifault

integer: indicator of a possible algorithm problem – for experts.

Methods (by class)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
x <- sgd(
  c(mean =  0, precision = 1  ),
  c(mean =  3, precision = 0.5),
  c(mean = -1, precision = 2  )
)
kmeans(x)

N <- 100
M <- 4
w <- matrix(runif(N * M), N, M)
w <- w / rowSums(w)
samp <- tidyr::crossing(
  observation = paste0("O", 1:N),
  component = paste0("C", 1:M)
) %>%
dplyr::mutate(mixing = as.numeric(t(w)))
dict <- tibble::tibble(
  component = paste0("C", 1:M),
  mean = numeric(M),
  precision = 1:M
)
x <- gmd(samp, dict)
kx <- kmeans(x)

astamm/game documentation built on June 5, 2019, 8:53 a.m.