# Rock Clustering

### Description

Cluster a data matrix using the Rock algorithm.

### Usage

1 2 3 4 |

### Arguments

`x` |
a data matrix; for |

`n` |
the number of desired clusters. |

`beta` |
optional distance threshold. |

`theta` |
neighborhood parameter in the range [0,1). |

`fun` |
distance function to use. |

`funArgs` |
a |

`debug` |
turn on/off debugging output. |

### Details

The intended area of application is the clustering of binary (logical)
data. For instance in a preprocessing step in data mining. However,
arbitrary distance metrics could be used (see
`dist`

).

According to the reference (see below) the distance threshold and the
neighborhood parameter are coupled. Thus, higher values of the neighborhood
parameter `theta`

pose a tighter constraint on the neighborhood. For
any two data points the latter is defined as the number of other data points
that are neighbors to both. Further, points only are neighbors (or linked)
if their distance is less than or equal `beta`

.

Note that for a tight neighborhood specification the algorithm may be running out of clusters to merge, i.e. may terminate with more than the desired number of clusters.

The `debug`

option can help in determining the proper settings by
examining lines suffixed with a plus which indicates that non-singleton
clusters were merged.

Note that tie-breaking is not implemented, i.e. the first max encountered is used. However, permuting the order of the data can help in determining the dependence of a solution on ties.

Function `rockLink`

is provided for applications that need to compute
link count distances efficiently. Note that `NA`

and `NaN`

distances are ignored but supplying such values for the threshold
`beta`

results in an error.

### Value

`rockCluster`

returns an object of class `rock`

, a list with
the following components:

`x` |
the data matrix or a subset of it. |

`cl` |
a factor of cluster labels. |

`size` |
a vector of cluster sizes. |

`beta` |
see above. |

`theta` |
see above. |

`rockLink`

returns an object of class `dist`

.

### Author(s)

Christian Buchta

### References

S. Guha, R. Rastogi, and K. Shim. ROCK: A Robust Clustering Algorithm for
Categorical Attributes. *Information Science*, Vol. 25, No. 5, 2000.

### See Also

`dist`

for common distance functions,
`predict`

for classifying new data samples, and
`fitted`

for classifying the clustered data samples.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
### example from paper
data(Votes)
x <- as.dummy(Votes[-17])
rc <- rockCluster(x, n=2, theta=0.73, debug=TRUE)
print(rc)
rf <- fitted(rc)
table(Votes$Class, rf$cl)
## Not run:
### large example from paper
data("Mushroom")
x <- as.dummy(Mushroom[-1])
rc <- rockCluster(x[sample(dim(x)[1],1000),], n=10, theta=0.8)
print(rc)
rp <- predict(rc, x)
table(Mushroom$class, rp$cl)
## End(Not run)
### real valued example
gdist <- function(x, y=NULL) 1-exp(-dist(x, y)^2)
xr <- matrix(rnorm(200, sd=0.6)+rep(rep(c(1,-1),each=50),2), ncol=2)
rcr <- rockCluster(xr, n=2, theta=0.75, fun=gdist, funArgs=NULL)
print(rcr)
``` |