tclust_H: tclust_H

Description Usage Arguments Details Value References Examples

Description

A wrapper for the internal fucntion tclust_. Performs robust non spherical clustering, tclust, where initial solutions are allowed.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
tclust_H(
  x,
  k = 3,
  alpha = 0.05,
  nstart = 50,
  iter.max = 20,
  restr = "eigen",
  restr.fact = 12,
  sol_ini_p = FALSE,
  sol_ini = NA,
  equal.weights = FALSE,
  trace = 0,
  zero.tol = 1e-16
)

Arguments

x

A matrix or data.frame of dimension n x p, containing the observations (row-wise).

k

The number of clusters initially searched for.

alpha

The proportion of observations to be trimmed.

nstart

The number of random initializations to be performed. Only when sol_ini_p = FALSE.

iter.max

The maximum number of concentration steps to be performed. The concentration steps are stopped, whenever two consecutive steps lead to the same data partition.

restr

The type of restriction to be applied on the cluster scatter matrices. Valid values are "eigen" (default).

restr.fact

The constant restr.fact >= 1 constrains the allowed differences among group scatters. Larger values imply larger differences of group scatters, a value of 1 specifies the strongest restriction.

sol_ini_p

Initial solution for parameters provided by the user TRUE/FALSE, if TRUE is stored in sol_ini.

sol_ini

Initial solution for parameters provided by the user.

equal.weights

A logical value, specifying whether equal cluster weights (TRUE) or not (FALSE) shall be considered in the concentration and assignment steps.

trace

Defines the tracing level, which is set to 0 by default. Tracing level 2 gives additional information on the iteratively decreasing objective function's value.

zero.tol

The zero tolerance used. By default set to 1e-16.

Details

This iterative algorithm initializes k clusters randomly and performs "concentration steps" in order to improve the current cluster assignment. The number of maximum concentration steps to be performed is given by iter.max. For approximately obtaining the global optimum, the system is initialized nstart times and concentration steps are performed until convergence or iter.max is reached. When processing more complex data sets higher values of nstart and iter.max have to be specified (obviously implying extra computation time). However, if more then half of the iterations would not converge, a warning message is issued, indicating that nstart has to be increased.

The parameter restr defines the cluster's shape restrictions, which are applied on all clusters during each iteration. Options "eigen"/"deter" restrict the ratio between the maximum and minimum eigenvalue/determinant of all cluster's covariance structures to parameter restr.fact. Setting restr.fact to 1, yields the strongest restriction, forcing all eigenvalues/determinants to be equal and so the method looks for similarly scattered (respectively spherical) clusters. Option "sigma" is a simpler restriction, which averages the covariance structures during each iteration (weighted by cluster sizes) in order to get similar (equal) cluster scatters.

Value

A list with values:

centers

A matrix of size p x k containing the centers (column-wise) of each cluster.

cov

An array of size p x p x k containing the covariance matrices of each cluster.

cluster

A numerical vector of size n containing the cluster assignment for each observation. Cluster names are integer numbers from 1 to k, 0 indicates trimmed observations.

par

A list, containing the parameters the algorithm has been called with (x, if not suppressed by store.x = FALSE, k, alpha, restr.fact, nstart, KStep, and equal.weights).

weights

A numerical vector of length k, containing the weights of each cluster.

obj

he value of the objective function of the best (returned) solution.

References

Fritz, H., Garcia-Escudero, L. A., & Mayo-Iscar, A. (2012). tclust: An r package for a trimming approach to cluster analysis. Journal of Statistical Software, 47(12), 1-26.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
x <- rbind(matrix(rnorm(100), ncol = 2), matrix(rnorm(100) + 2, ncol = 2),
        matrix(rnorm(100) + 4, ncol = 2))
## robust cluster obtention from a sample x asking for 3 clusters,
## trimming level 0.05 and constrain level 12
k <- 3; alpha <- 0.05; restr.fact <- 12
output <- tclust_H(x = x, k = k, alpha = alpha, nstart = 50, iter.max = 20,
                 restr = "eigen", restr.fact = restr.fact, sol_ini_p = FALSE, sol_ini = NA,
                 equal.weights = FALSE, trace = 0, zero.tol = 1e-16)
## cluster assigment
output$cluster
plot(x, col = output$cluster)

optimalFlow documentation built on Nov. 8, 2020, 6:59 p.m.