Description Usage Arguments Value Examples
Robust Model Base Clustering a robust and efficient version of EM algorithm.
1 | RMBC(Y, K, max_iter = 80, tolerance = 1e-04)
|
Y |
A matrix of size n x p. |
K |
The number of clusters. |
max_iter |
a maximum number of iterations used for the algorithm stopping rule |
tolerance |
tolerance parameter used for the algorithm stopping rule |
A list including the estimated mixture distribution parameters and cluster-label for the observations
alpha
: K numeric values representing the convex combination coefficients.
mu
: a list of length K with the location initial estimators.
sigma
: a list of length K with the location scatter matrix estimators.
nonoutliers
: an array of indices that contains the estimated nonoutliers observations
outliers
: an array of indices that contains the estimated outliers observations
cluster
: A vector of integers (from 0:k) indicating the cluster to which each point is allocated.
0 label corresponds to outliers
cluster_without_outliers
: A vector of integers (from 1:k) indicating the cluster to which each point is allocated.
Note: Outliers are assigned to the cluster that maximizes
the belonging probability. To recognize the outliers the field
outliers should be used
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | # Generate Sintetic data (three normal cluster in two dimension)
# clusters have different shapes and orentation.
# The data is contaminated uniformly (level 20%).
################################################
#### Start data generating process ############
##############################################
# generates base clusters
Z1 <- c(rnorm(50,0),rnorm(50,0),rnorm(50,0))
Z2 <- rnorm(150);
X <- matrix(0, ncol=2,nrow=150);
X[,1]=Z1;X[,2]=Z2
true.cluster= c(rep(1,50),rep(2,50),rep(3,50))
# rotate, expand and translate base clusters
theta=pi/3;
aux1=matrix(c(cos(theta),-sin(theta),sin(theta),cos(theta)),nrow=2)
aux2=sqrt(4)*diag(c(1,1/4))
B=aux1%*%aux2%*%t(aux1)
X[true.cluster==3,]=X[true.cluster==3,]%*%aux2%*%aux1 +
matrix(c(15,2), byrow = TRUE,nrow=50,ncol=2)
X[true.cluster==2,2] = X[true.cluster==2,2]*4
X[true.cluster==1,2] = X[true.cluster==1,2]*0.1
X[true.cluster==1, ] = X[true.cluster==1,]+
matrix(c(-15,-1),byrow = TRUE,nrow=50,ncol=2)
### Generate 30 sintetic outliers (contamination level 20%)
outliers=sample(1:150,30)
X[outliers, ] <- matrix(runif( 60, 2 * min(X), 2 * max(X) ),
ncol = 2, nrow = 30)
###############################################
#### END data generating process ############
#############################################
### APLYING RMBC ALGORITHM
ret = RMBC(Y=X, K=3,max_iter = 82)
cluster = ret$cluster
#############################################
### plotting results ########################
#############################################
oldpar=par(mfrow=c(1,2))
plot(X, main="actual clusters" )
for (j in 1:3){
points(X[true.cluster==j,],pch=19, col=j+1)
}
points(X[outliers,],pch=19,col=1)
plot(X,main="clusters estimation")
for (j in 1:3){
points(X[cluster==j,],pch=19, col=j+1)
}
points(X[ret$outliers,],pch=19,col=1)
par(oldpar)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.