Classification Trimmed Likelihood Curves

Share:

Description

The function applies tclust several times on a given dataset while parameters alpha and k are altered. The resulting object gives an idea of the optimal trimming level and number of clusters considering a particular dataset.

Usage

1
2
ctlcurves (x, k = 1:4, alpha = seq (0, 0.2, len = 6), 
           restr.fact = 50, trace = 1, ...)

Arguments

x

A matrix or data frame of dimension n x p, containing the observations (row-wise).

k

A vector of cluster numbers to be checked. By default cluster numbers from 1 to 5 are examined.

alpha

A vector containing the alpha levels to be checked. By default alpha levels from 0 to 0.2 (continuously increased by 0.01), are checked.

restr.fact

The restriction factor passed to tclust.

...

Further arguments (as e.g. restr), passed to tclust.

trace

Defines the tracing level, which is set to 1 by default. Tracing level 2 gives additional information on the current iteration.

Details

These curves show the values of the trimmed classification (log-)likelihoods when altering the trimming proportion alpha and the number of clusters k. The careful examination of these curves provides valuable information for choosing these parameters in a clustering problem. For instance, an appropriate k to be chosen is one that we do not observe a clear increase in the trimmed classification likelihood curve for k with respect to the k+1 curve for almost all the range of alpha values. Moreover, an appropriate choice of parameter alpha may be derived by determining where an initial fast increase of the trimmed classification likelihood curve stops for the final chosen k. A more detailed explanation can be found in García-Escudero et al. (2010).

Value

The function returns an S3 object of type ctlcurves with components:

par

A list containing all the parameters passed to this function.

obj

An array containing the objective functions values of each computed cluster-solution.

min.weights

An array containing the minimum cluster weight of each computed cluster-solution.

So far there is no output available for print.ctlcurves. Use plot on an ctlcurves object for a graphical interpretation of it.

Author(s)

Agustin Mayo Iscar, Luis Angel Garcia Escudero, Heinrich Fritz

References

García-Escudero, L.A.; Gordaliza, A.; Matrán, C. and Mayo-Iscar, A. (2010), "Exploring the number of groups in robust model-based clustering." Statistics and Computing, (Forthcoming).
Preprint available at www.eio.uva.es/infor/personas/langel.html.

See Also

plot.ctlcurves

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
## Not run: 
#--- EXAMPLE 1 ------------------------------------------

sig <- diag (2)
cen <- rep (1, 2)
x <- rbind (
	rmvnorm (108, cen * 0,   sig),
	rmvnorm (162, cen * 5,   sig * 6 - 2),
	rmvnorm (30, cen * 2.5, sig * 50)
)

ctl <- ctlcurves (x, k = 1:4)

  ##  ctl-curves 
plot (ctl)  ##  --> selecting k = 2, alpha = 0.08

  ##  the selected model 
plot (tclust (x, k = 2, alpha = 0.08, restr.fact = 7))

#--- EXAMPLE 2 ------------------------------------------

data (geyser2)
ctl <- ctlcurves (geyser2, k = 1:5)

  ##  ctl-curves 
plot (ctl)  ##  --> selecting k = 3, alpha = 0.08

  ##  the selected model
plot (tclust (geyser2, k = 3, alpha = 0.08, restr.fact = 5))


#--- EXAMPLE 3 ------------------------------------------

data (swissbank)
ctl <- ctlcurves (swissbank, k = 1:5, alpha = seq (0, 0.3, by = 0.025))

  ##  ctl-curves 
plot (ctl)  ##  --> selecting k = 2, alpha = 0.1

  ##  the selected model
plot (tclust (swissbank, k = 2, alpha = 0.1, restr.fact = 50))

## End(Not run)