improvedktaucenters: improvedktaucenters

Description Usage Arguments Value References Examples

View source: R/improvedktaucenters.R

Description

Robust Clustering algorithm for non-spherical data. This function estimate clusters taking into account that clusters may have different size, volume or orientation.

Usage

1
2
improvedktaucenters(X, K, cutoff = 0.999, nstart = 5,
  INITcenters = NULL)

Arguments

X

numeric matrix of size n x p.

K

the number of cluster.

cutoff

optional argument for getOutliers - quantiles of chi-square to be used as a threshold for outliers detection, defaults to 0.999

nstart

optional the number of trials that the base algorithm ktaucenters_aux is run at the first stage. #' If it is greater than 1 and center is not set as NULL, a random set of (distinct) rows in x is chosen as the initial centres for each try.

INITcenters

numeric matrix of size K x p indicating the initial centers for that clusters and robust covarianze matrices will be computed, if it is set as NULL the algorithm will compute @param INITcenters from ktaucenters routine. Set to NULL by default.

Value

A list including the estimated K centers and clusters labels for the observations

References

Gonzalez, J. D., Yohai, V. J., & Zamar, R. H. (2019). Robust Clustering Using Tau-Scales. arXiv preprint arXiv:1906.08198.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Generate Sintetic data (three normal cluster in two dimension)
# clusters have different shapes and orentation.
# The data is contaminated uniformly (level 20%).

################################################
#### Start data generating process ############
##############################################

# generates base clusters
set.seed(1)
Z1 <- c(rnorm(100,0),rnorm(100,0),rnorm(100,0))
Z2 <- rnorm(300);
X <-  matrix(0, ncol=2,nrow=300);
X[,1]=Z1;X[,2]=Z2
true.cluster= c(rep(1,100),rep(2,100),rep(3,100))

# rotate, expand and translate base clusters
theta=pi/3;
aux1=matrix(c(cos(theta),-sin(theta),sin(theta),cos(theta)),nrow=2)
aux2=sqrt(4)*diag(c(1,1/4))
B=aux1%*%aux2%*%t(aux1)
X[true.cluster==3,]=X[true.cluster==3,]%*%aux2%*%aux1 + matrix(c(5,2),byrow = TRUE,nrow=100,ncol=2)
X[true.cluster==2,2]=X[true.cluster==2,2]*5
X[true.cluster==1,2]=X[true.cluster==1,2]*0.1
X[true.cluster==1, ]=X[true.cluster==1,]+ matrix(c(-5,-1),byrow = TRUE,nrow=100,ncol=2)
### Generate 60 sintetic outliers (contamination level 20%)

outliers=sample(1:300,60)
X[outliers, ] <- matrix(runif( 40, 2 * min(X), 2 * max(X) ),
                                ncol = 2, nrow = 60)
###############################################
#### END data generating process ############
#############################################

#############################################
### Applying the algortihm ##################
#############################################
ret=improvedktaucenters(X,K=3,cutoff=0.999)

#############################################
### plotting results ########################
#############################################
par(mfrow=c(2,1))
#' plot(X,main="actual clusters")
for (j in 1:3){
 points(X[true.cluster==j,],pch=19, col=j+1)
}
points(X[outliers,],pch=19,col=1)
plot(X,main="clusters estimation")
for (j in 1:3){
 points(X[ret$cluster==j,],pch=19, col=j+1)
}
points(X[ret$outliers,],pch=19)

ktaucenters documentation built on Aug. 3, 2019, 9:03 a.m.