kmeans.weight.tune: Choose Tuning Parameter for Weighted K-Means Clustering
In cuhklinlab/SWKM: Sparse Weighted K-Means

View source: R/kmeans_weight_tune.R

kmeans.weight.tune

R Documentation

Choose Tuning Parameter for Weighted K-Means Clustering

Description

The tuning parameter controls the weight of noisy observations. A permutation approach is used to select the tuning parameter.

Usage

kmeans.weight.tune(x, K = NULL, noisy.lab = NULL, weight.seq = NULL,
  nperms = 20, centers = NULL, nstart = 20,
  algorithm = "Hartigan-Wong")

## S3 method for class 'kmeans.weight.tune'
plot(x, ...)

Arguments

`x`	An n by p numeric data matrix, and n is the number of observations and p the number of features.
`K`	The number of clusters. Omitted if `centers` are provided.
`noisy.lab`	A vector indicating the row positions of noisy observations. Omitted if `weight.seq` is provided.
`weight.seq`	A candidate weight matrix, each row indicating one candidate weight vector. If `NULL`, the function will assign a sequence of candidate weight `c(1,0.8,0.5,0.2,0.1,0.08,0.05,0.02,0.01,0.005,0.001)` to noisy observations by default.
`nperms`	Number of permutations. Default is `20`.
`centers`	A K by p matrix indicating initial (distinct) cluster centers.
`nstart`	The number of initial random sets chosen from (distinct) rows in `x`. Omitted if `centers` is provided. Default is 20.
`algorithm`	Character; either "`Hartigan-Wong`" or "`Forgy`". Default is "`Hartigan-Wong`".
`...`	unused.

Value

The function returns a list of the following components:

`gaps`	The gap statistics obtained (one for each of the candicate weights tried). If O(U) is the objective function evaluated at the tuning parameter U, and O(U)* is the same quantity but for the permuted weights, then Gap(U)=mean(log(O*(U)))-log(O(U)).
`sdgaps`	The standard deviation of log(O*(U))
`bestweight`	The best weight chosen by this method among all the candidate weights.

Methods (by generic)

plot: plot the Gap statistic of each candicate weight vector.

Author(s)

Wenyu Zhang

Examples

## Not run: 
set.seed(1)
data("DMdata")
# data preprocessing
data <- t(DMdata$data)
data_rank <- apply(data, 2, rank) 
data_rank_center<- t(t(data_rank) - colMeans(data_rank)) 
data_rank_center_scale <- t(t(data_rank_center)/apply(data_rank_center, 2, sd)) 
data_processed <-  t(data_rank_center_scale) 
# tune the number of cluster K
# nperms and nstart are set to be small in order to save computation time
cK <- ChooseK(data_processed[-DMdata$noisy.label,],nClusters = 1:6,nperms = 10,nstart = 5)
plot(cK)
K <- cK$OptimalK
# tune weight
  res.tuneU <- kmeans.weight.tune(x = data_processed,K = K,
  noisy.lab = DMdata$noisy.label,nperms = 10,nstart = 5)
plot(res.tuneU)
# perform weighted K-means
res <- kmeans.weight(x = data_processed,K = K,weight = res.tuneU$bestweight)
# check the result
table(res$cluster,DMdata$true.label)

## End(Not run)

cuhklinlab/SWKM documentation built on Aug. 5, 2022, 2:27 a.m.