kmeans.weight.tune: Choose Tuning Parameter for Weighted K-Means Clustering

Description Usage Arguments Value Methods (by generic) Author(s) See Also Examples

View source: R/kmeans_weight_tune.R

Description

The tuning parameter controls the weight of noisy observations. A permutation approach is used to select the tuning parameter.

Usage

1
2
3
4
5
6
kmeans.weight.tune(x, K = NULL, noisy.lab = NULL, weight.seq = NULL,
  nperms = 20, centers = NULL, nstart = 20,
  algorithm = "Hartigan-Wong")

## S3 method for class 'kmeans.weight.tune'
plot(x, ...)

Arguments

x

An n by p numeric data matrix, and n is the number of observations and p the number of features.

K

The number of clusters. Omitted if centers are provided.

noisy.lab

A vector indicating the row positions of noisy observations. Omitted if weight.seq is provided.

weight.seq

A candidate weight matrix, each row indicating one candidate weight vector. If NULL, the function will assign a sequence of candidate weight c(1,0.8,0.5,0.2,0.1,0.08,0.05,0.02,0.01,0.005,0.001) to noisy observations by default.

nperms

Number of permutations. Default is 20.

centers

A K by p matrix indicating initial (distinct) cluster centers.

nstart

The number of initial random sets chosen from (distinct) rows in x. Omitted if centers is provided. Default is 20.

algorithm

Character; either "Hartigan-Wong" or "Forgy". Default is "Hartigan-Wong".

...

unused.

Value

The function returns a list of the following components:

gaps

The gap statistics obtained (one for each of the candicate weights tried). If O(U) is the objective function evaluated at the tuning parameter U, and O*(U) is the same quantity but for the permuted weights, then Gap(U)=mean(log(O*(U)))-log(O(U)).

sdgaps

The standard deviation of log(O*(U))

bestweight

The best weight chosen by this method among all the candidate weights.

Methods (by generic)

Author(s)

Wenyu Zhang

See Also

Other sparse weighted K-Means functions: ChooseK, KMeansSparseCluster.permute.weight, KMeansSparseCluster.weight, kmeans.weight

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
set.seed(1)
data("DMdata")
# data preprocessing
data <- t(DMdata$data)
data_rank <- apply(data, 2, rank) 
data_rank_center<- t(t(data_rank) - colMeans(data_rank)) 
data_rank_center_scale <- t(t(data_rank_center)/apply(data_rank_center, 2, sd)) 
data_processed <-  t(data_rank_center_scale) 
# tune the number of cluster K
# nperms and nstart are set to be small in order to save computation time
cK <- ChooseK(data_processed[-DMdata$noisy.label,],nClusters = 1:6,nperms = 10,nstart = 5)
plot(cK)
K <- cK$OptimalK
# tune weight
  res.tuneU <- kmeans.weight.tune(x = data_processed,K = K,
  noisy.lab = DMdata$noisy.label,nperms = 10,nstart = 5)
plot(res.tuneU)
# perform weighted K-means
res <- kmeans.weight(x = data_processed,K = K,weight = res.tuneU$bestweight)
# check the result
table(res$cluster,DMdata$true.label)

## End(Not run)

Van1yu3/SWKM documentation built on Sept. 3, 2019, 7:50 a.m.