nroKmeans: K-means clustering
In Numero: Statistical Framework to Define Subgroups in Complex Datasets

nroKmeans

R Documentation

K-means clustering

Description

K-means clustering for multi-dimensional data.

Usage

nroKmeans(data, k = 3, subsample = NULL, balance = 0, message = NULL)

Arguments

`data`	A data frame or a matrix.
`k`	Number of centroids.
`subsample`	Number of randomly selected rows used during a single training cycle.
`balance`	Penalty parameter for size difference between clusters.
`message`	If positive, progress information is printed at the specified interval in seconds.

Details

The K centroids are determined by Lloyd's algorithm with Euclidean distances or by using 1 - Pearson correlation as the distance measure.

If subsample is less than the number of data rows, a random subset of the specified size is used for each training cycle. By default, subsample is set automatically depending on the size of the dataset.

If balance = 0.0, the algorithm is applied with no balancing, if balance = 1.0 all the clusters will be forced to be of equal size. Intermediate values are permitted. Note that if subsampling is applied, balancing may become less accurate.

Value

A list with named elements: centroids is a matrix of the main results, layout contains the best-matching centroid labels and model residuals for each usable data point and history is the chronological record of training errors. The subsampling parameter that was used during training is stored in the element subsample.

Examples

# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Prepare training data.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- scale.default(dataset[,trvars]) 

# Unbalanced K-means clustering.
km0 <- nroKmeans(data = trdata, k = 5, balance = 0.0)
print(table(km0$layout$BMC))
print(km0$centroids)

# Balanced K-means clustering.
km1 <- nroKmeans(data = trdata, k = 5, balance = 1.0)
print(table(km1$layout$BMC))
print(km1$centroids)

Numero documentation built on Sept. 17, 2024, 5:09 p.m.