Description Usage Arguments Value References Examples
View source: R/improvedktaucenters.R
Robust Clustering algorithm for non-spherical data. This function estimate clusters taking into account that clusters may have different size, volume or orientation.
1 2 | improvedktaucenters(X, K, cutoff = 0.999, nstart = 5,
INITcenters = NULL)
|
X |
numeric matrix of size n x p. |
K |
the number of cluster. |
cutoff |
optional argument for getOutliers - quantiles of chi-square to be used as a threshold for outliers detection, defaults to 0.999 |
nstart |
optional the number of trials that the base algorithm ktaucenters_aux is run at the first stage. #' If it is greater than 1 and center is not set as NULL, a random set of (distinct) rows in x is chosen as the initial centres for each try. |
INITcenters |
numeric matrix of size K x p indicating the initial centers for that clusters and robust covarianze matrices will be computed, if it is set as NULL the algorithm will compute @param INITcenters from ktaucenters routine. Set to NULL by default. |
A list including the estimated K centers and clusters labels for the observations
centers
: matrix of size K x p, with the estimated K centers.
cluster
: array of size n x 1 integers labels between 1 and K.
tauPath
: sequence of tau scale values at each iterations.
Wni
: numeric array of size n x 1 indicating the weights associated to each observation.
emptyClusterFlag
: a boolean value. True means that in some iteration there were clusters totally empty.
niter
: number of iterations untill convergence is achived or maximun number of iteration is reached.
sigmas
: a list containing the k covariance matrices found by the procedure at its second step.
outliers
: indices observation that can be considered as outliers.
Gonzalez, J. D., Yohai, V. J., & Zamar, R. H. (2019). Robust Clustering Using Tau-Scales. arXiv preprint arXiv:1906.08198.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # Generate Sintetic data (three normal cluster in two dimension)
# clusters have different shapes and orentation.
# The data is contaminated uniformly (level 20%).
################################################
#### Start data generating process ############
##############################################
# generates base clusters
set.seed(1)
Z1 <- c(rnorm(100,0),rnorm(100,0),rnorm(100,0))
Z2 <- rnorm(300);
X <- matrix(0, ncol=2,nrow=300);
X[,1]=Z1;X[,2]=Z2
true.cluster= c(rep(1,100),rep(2,100),rep(3,100))
# rotate, expand and translate base clusters
theta=pi/3;
aux1=matrix(c(cos(theta),-sin(theta),sin(theta),cos(theta)),nrow=2)
aux2=sqrt(4)*diag(c(1,1/4))
B=aux1%*%aux2%*%t(aux1)
X[true.cluster==3,]=X[true.cluster==3,]%*%aux2%*%aux1 + matrix(c(5,2),byrow = TRUE,nrow=100,ncol=2)
X[true.cluster==2,2]=X[true.cluster==2,2]*5
X[true.cluster==1,2]=X[true.cluster==1,2]*0.1
X[true.cluster==1, ]=X[true.cluster==1,]+ matrix(c(-5,-1),byrow = TRUE,nrow=100,ncol=2)
### Generate 60 sintetic outliers (contamination level 20%)
outliers=sample(1:300,60)
X[outliers, ] <- matrix(runif( 40, 2 * min(X), 2 * max(X) ),
ncol = 2, nrow = 60)
###############################################
#### END data generating process ############
#############################################
#############################################
### Applying the algortihm ##################
#############################################
ret=improvedktaucenters(X,K=3,cutoff=0.999)
#############################################
### plotting results ########################
#############################################
par(mfrow=c(2,1))
#' plot(X,main="actual clusters")
for (j in 1:3){
points(X[true.cluster==j,],pch=19, col=j+1)
}
points(X[outliers,],pch=19,col=1)
plot(X,main="clusters estimation")
for (j in 1:3){
points(X[ret$cluster==j,],pch=19, col=j+1)
}
points(X[ret$outliers,],pch=19)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.