nroKmeans: K-means clustering

View source: R/nroKmeans.R

nroKmeansR Documentation

K-means clustering

Description

K-means clustering for multi-dimensional data.

Usage

nroKmeans(data, k = 3, subsample = NULL, balance = 0, message = NULL)

Arguments

data

A data frame or a matrix.

k

Number of centroids.

subsample

Number of randomly selected rows used during a single training cycle.

balance

Penalty parameter for size difference between clusters.

message

If positive, progress information is printed at the specified interval in seconds.

Details

The K centroids are determined by Lloyd's algorithm with Euclidean distances or by using 1 - Pearson correlation as the distance measure.

If subsample is less than the number of data rows, a random subset of the specified size is used for each training cycle. By default, subsample is set automatically depending on the size of the dataset.

If balance = 0.0, the algorithm is applied with no balancing, if balance = 1.0 all the clusters will be forced to be of equal size. Intermediate values are permitted. Note that if subsampling is applied, balancing may become less accurate.

Value

A list with named elements: centroids is a matrix of the main results, layout contains the best-matching centroid labels and model residuals for each usable data point and history is the chronological record of training errors. The subsampling parameter that was used during training is stored in the element subsample.

Examples

# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Prepare training data.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- scale.default(dataset[,trvars]) 

# Unbalanced K-means clustering.
km0 <- nroKmeans(data = trdata, k = 5, balance = 0.0)
print(table(km0$layout$BMC))
print(km0$centroids)

# Balanced K-means clustering.
km1 <- nroKmeans(data = trdata, k = 5, balance = 1.0)
print(table(km1$layout$BMC))
print(km1$centroids)

Numero documentation built on Jan. 9, 2023, 9:08 a.m.

Related to nroKmeans in Numero...