kmeans.weight: Weighted K-Means Clustering with Weights on Observations
In cuhklinlab/SWKM: Sparse Weighted K-Means

View source: R/kmeans_weight.R

kmeans.weight

R Documentation

Weighted K-Means Clustering with Weights on Observations

Description

Perform K-Means algorithm on observations with given weights.

Usage

kmeans.weight(x, K = NULL, weight = NULL, centers = NULL,
  nstart = 20, algorithm = "Hartigan-Wong")

Arguments

`x`	An n by p numeric data matrix, and n is the number of observations and p the number of features.
`K`	The number of clusters. Omitted if `centers` are provided.
`weight`	A vector of n positive elements representing weights on observations.
`centers`	A K by p matrix indicating initial (distinct) cluster centers.
`nstart`	The number of initial random sets chosen from (distinct) rows in `x`. Omitted if `centers` is provided. Default is 20.
`algorithm`	Character; either "`Hartigan-Wong`" or "`Forgy`". Default is "`Hartigan-Wong`".

Value

The function returns a list of the following components:

`centers`	the centers of the clustering result.
`cluster`	a vector of integers (from `1:k`) indicating the cluster to which each observation is allocated.
`weight`	a vector of non-zero weights in the input vector `weight`.
`wcss`	normalized within-cluster sum of squares, i.e. the objective divided by `sum(weight)`.

Author(s)

Wenyu Zhang

Examples

## Not run: 
set.seed(1)
data("DMdata")
# data preprocessing
data <- t(DMdata$data)
data_rank <- apply(data, 2, rank) 
data_rank_center<- t(t(data_rank) - colMeans(data_rank)) 
data_rank_center_scale <- t(t(data_rank_center)/apply(data_rank_center, 2, sd)) 
data_processed <-  t(data_rank_center_scale) 
# tune the number of cluster K
# nperms and nstart are set to be small in order to save computation time
cK <- ChooseK(data_processed[-DMdata$noisy.label,],nClusters = 1:6,nperms = 10,nstart = 5)
plot(cK)
K <- cK$OptimalK
# tune weight
  res.tuneU <- kmeans.weight.tune(x = data_processed,K = K,
  noisy.lab = DMdata$noisy.label,nperms = 10,nstart = 5)
plot(res.tuneU)
# perform weighted K-means
res <- kmeans.weight(x = data_processed,K = K,weight = res.tuneU$bestweight)
# check the result
table(res$cluster,DMdata$true.label)

## End(Not run)

cuhklinlab/SWKM documentation built on Aug. 5, 2022, 2:27 a.m.