tclust_H: tclust_H
In HristoInouzhe/optimalFlow: optimalFlow

tclust_H

R Documentation

tclust_H

Description

A wrapper for the internal fucntion tclust_. Performs robust non spherical clustering, tclust, where initial solutions are allowed.

Usage

tclust_H(
  x,
  k = 3,
  alpha = 0.05,
  nstart = 50,
  iter.max = 20,
  restr = "eigen",
  restr.fact = 12,
  sol_ini_p = FALSE,
  sol_ini = NA,
  equal.weights = FALSE,
  trace = 0,
  zero.tol = 1e-16
)

Arguments

`x`	A matrix or data.frame of dimension n x p, containing the observations (row-wise).
`k`	The number of clusters initially searched for.
`alpha`	The proportion of observations to be trimmed.
`nstart`	The number of random initializations to be performed. Only when sol_ini_p = FALSE.
`iter.max`	The maximum number of concentration steps to be performed. The concentration steps are stopped, whenever two consecutive steps lead to the same data partition.
`restr`	The type of restriction to be applied on the cluster scatter matrices. Valid values are "eigen" (default).
`restr.fact`	The constant restr.fact >= 1 constrains the allowed differences among group scatters. Larger values imply larger differences of group scatters, a value of 1 specifies the strongest restriction.
`sol_ini_p`	Initial solution for parameters provided by the user TRUE/FALSE, if TRUE is stored in sol_ini.
`sol_ini`	Initial solution for parameters provided by the user.
`equal.weights`	A logical value, specifying whether equal cluster weights (TRUE) or not (FALSE) shall be considered in the concentration and assignment steps.
`trace`	Defines the tracing level, which is set to 0 by default. Tracing level 2 gives additional information on the iteratively decreasing objective function's value.
`zero.tol`	The zero tolerance used. By default set to 1e-16.

Details

This iterative algorithm initializes k clusters randomly and performs "concentration steps" in order to improve the current cluster assignment. The number of maximum concentration steps to be performed is given by iter.max. For approximately obtaining the global optimum, the system is initialized nstart times and concentration steps are performed until convergence or iter.max is reached. When processing more complex data sets higher values of nstart and iter.max have to be specified (obviously implying extra computation time). However, if more then half of the iterations would not converge, a warning message is issued, indicating that nstart has to be increased.

The parameter restr defines the cluster's shape restrictions, which are applied on all clusters during each iteration. Options "eigen"/"deter" restrict the ratio between the maximum and minimum eigenvalue/determinant of all cluster's covariance structures to parameter restr.fact. Setting restr.fact to 1, yields the strongest restriction, forcing all eigenvalues/determinants to be equal and so the method looks for similarly scattered (respectively spherical) clusters. Option "sigma" is a simpler restriction, which averages the covariance structures during each iteration (weighted by cluster sizes) in order to get similar (equal) cluster scatters.

Value

A list with values:

centers: A matrix of size p x k containing the centers (column-wise) of each cluster.
cov: An array of size p x p x k containing the covariance matrices of each cluster.
cluster: A numerical vector of size n containing the cluster assignment for each observation. Cluster names are integer numbers from 1 to k, 0 indicates trimmed observations.
par: A list, containing the parameters the algorithm has been called with (x, if not suppressed by store.x = FALSE, k, alpha, restr.fact, nstart, KStep, and equal.weights).
weights: A numerical vector of length k, containing the weights of each cluster.
obj: he value of the objective function of the best (returned) solution.

References

Fritz, H., Garcia-Escudero, L. A., & Mayo-Iscar, A. (2012). tclust: An r package for a trimming approach to cluster analysis. Journal of Statistical Software, 47(12), 1-26.

Examples

x <- rbind(matrix(rnorm(100), ncol = 2), matrix(rnorm(100) + 2, ncol = 2),
        matrix(rnorm(100) + 4, ncol = 2))
## robust cluster obtention from a sample x asking for 3 clusters,
## trimming level 0.05 and constrain level 12
k <- 3; alpha <- 0.05; restr.fact <- 12
output <- tclust_H(x = x, k = k, alpha = alpha, nstart = 50, iter.max = 20,
                 restr = "eigen", restr.fact = restr.fact, sol_ini_p = FALSE, sol_ini = NA,
                 equal.weights = FALSE, trace = 0, zero.tol = 1e-16)
## cluster assigment
output$cluster
plot(x, col = output$cluster)

HristoInouzhe/optimalFlow documentation built on April 23, 2023, 5:45 p.m.