wrk: wrk

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/wrk.R

Description

This function performs robust (weighted) k-means clustering which is very useful in case of contaminated data. The method aims at detecting both clusters and outliers.

Usage

1
wrk(data, D, k, max.iter = 30, cutoff = 0.5)

Arguments

data

A data matrix with n observations and p variables.

D

A distance matrix.

k

The number of clusters.

max.iter

The maximum number of iterations to reach local optimum, the default is 30.

cutoff

A cutoff value to determine outliers. An observation is declared as an outlier if its weight is smaller than or equal to this cutoff, the default is 0.5.

Details

wrk initializes the clustering procedure by ROBIN approach and incorporates a weighting function on each detected clusters during k-means clustering. The weighting function uses LOF in order to assign a weight for each observation. The resulting observation weights reflect a degree of outlyingness and range between 0 and 1. These weights are used to identify outliers as observations with a weight <=cutoff.

Value

clusters

An integer vector with values from 1 to k, indicating a resulting cluster membership.

obsweights

A numeric vector of observation weights ranging between 0 and 1.

outclusters

An integer vector with values from 0 to k, containing both cluster membership and identified outliers. 0 corresponds to outlier.

WCSS

The within-cluster sum of squares of the local optimum.

centers

The set of cluster centers.

Author(s)

Sarka Brodinova <sarka.brodinova@tuwien.ac.at>

@references S. Brodinova, P. Filzmoser, T. Ortner, C. Breiteneder, M. Zaharieva. Robust and sparse k-means clustering for high-dimensional data. Submitted for publication, 2017. Available at http://arxiv.org/abs/1709.10012

See Also

ROBIN

Examples

1
2
3
4
5
6
7
8
9
# generate data
d <- SimData(size_grp=c(40,40,40),p_inf=50,
p_noise=750,p_out_noise=75)
dat <- scale(d$x)

res <- wrk(data=dat,D=dist(dat),k=3)
table(d$y,res$clusters)
table(d$lb,res$outclusters)
plot(res$obsweights,col=d$lb +1)

brodsa/wrsk documentation built on April 7, 2020, 6:12 a.m.