# wcls_bcls_matrices: Matrix Cluster Scatter Measures In clv: Cluster Validation Techniques

## Description

Functions compute two base matrix cluster scatter measures.

## Usage

 ```1 2``` ```wcls.matrix(data,clust,cluster.center) bcls.matrix(cluster.center,cluster.size,mean) ```

## Arguments

 `data` `numeric matrix` or `data.frame` where columns correspond to variables and rows to observations `clust` integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning. `cluster.center` `matrix` or `data.frame` where columns correspond to variables and rows to cluster centers defined by `data` and `clust` parameters. `cluster.size` integer `vector` with information about size of each cluster computed using `clust` vector. `mean` mean of all data objects.

## Details

There are two base matrix scatter measures.

1. within-cluster scatter measure defined as:

W = sum(forall k in 1:cluster.num) W(k)

where W(k) = sum(forall x) (x - m(k))*(x - m(k))'

 x - object belongs to cluster k, m(k) - center of cluster k.

2. between-cluster scatter measure defined as:

B = sum(forall k in 1:cluster.num) |C(k)|*( m(k) - m )*( m(k) - m )'

 |C(k)| - size of cluster k, m(k) - center of cluster k, m - center of all data objects.

## Value

 `wcls.matrix` returns W matrix (within-cluster scatter measure), `bcls.matrix` returns B matrix (between-cluster scatter measure).

## Author(s)

Lukasz Nieweglowski

## References

T. Hastie, R. Tibshirani, G. Walther Estimating the number of data clusters via the Gap statistic, http://citeseer.ist.psu.edu/tibshirani00estimating.html

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25``` ```# load and prepare data library(clv) data(iris) iris.data <- iris[,1:4] # cluster data pam.mod <- pam(iris.data,5) # create five clusters v.pred <- as.integer(pam.mod\$clustering) # get cluster ids associated to given data objects # compute cluster sizes, center of each cluster # and mean from data objects cls.attr <- cls.attrib(iris.data, v.pred) center <- cls.attr\$cluster.center size <- cls.attr\$cluster.size iris.mean <- cls.attr\$mean # compute matrix scatter measures W.matrix <- wcls.matrix(iris.data, v.pred, center) B.matrix <- bcls.matrix(center, size, iris.mean) T.matrix <- W.matrix + B.matrix # example of indices based on W, B i T matrices mx.scatt.crit1 = sum(diag(W.matrix)) mx.scatt.crit2 = sum(diag(B.matrix))/sum(diag(W.matrix)) mx.scatt.crit3 = det(W.matrix)/det(T.matrix) ```

clv documentation built on May 29, 2017, 9:50 a.m.