rockCluster: Rock Clustering
In cba: Clustering for Business Analytics

rockCluster

R Documentation

Rock Clustering

Description

Cluster a data matrix using the Rock algorithm.

Usage

rockCluster(x, n, beta = 1-theta, theta = 0.5, fun = "dist",
            funArgs = list(method="binary"), debug = FALSE)

rockLink(x, beta = 0.5)

Arguments

`x`	a data matrix; for `rockLink` an object of class `dist`.
`n`	the number of desired clusters.
`beta`	optional distance threshold.
`theta`	neighborhood parameter in the range [0,1).
`fun`	distance function to use.
`funArgs`	a `list` of named parameter arguments to `fun`.
`debug`	turn on/off debugging output.

Details

The intended area of application is the clustering of binary (logical) data. For instance in a preprocessing step in data mining. However, arbitrary distance metrics could be used (see dist).

According to the reference (see below) the distance threshold and the neighborhood parameter are coupled. Thus, higher values of the neighborhood parameter theta pose a tighter constraint on the neighborhood. For any two data points the latter is defined as the number of other data points that are neighbors to both. Further, points only are neighbors (or linked) if their distance is less than or equal beta.

Note that for a tight neighborhood specification the algorithm may be running out of clusters to merge, i.e. may terminate with more than the desired number of clusters.

The debug option can help in determining the proper settings by examining lines suffixed with a plus which indicates that non-singleton clusters were merged.

Note that tie-breaking is not implemented, i.e. the first max encountered is used. However, permuting the order of the data can help in determining the dependence of a solution on ties.

Function rockLink is provided for applications that need to compute link count distances efficiently. Note that NA and NaN distances are ignored but supplying such values for the threshold beta results in an error.

Value

rockCluster returns an object of class rock, a list with the following components:

`x`	the data matrix or a subset of it.
`cl`	a factor of cluster labels.
`size`	a vector of cluster sizes.
`beta`	see above.
`theta`	see above.

rockLink returns an object of class dist.

Author(s)

Christian Buchta

References

S. Guha, R. Rastogi, and K. Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Science, Vol. 25, No. 5, 2000.

Examples

### example from paper
data(Votes)
x <- as.dummy(Votes[-17])
rc <- rockCluster(x, n=2, theta=0.73, debug=TRUE)
print(rc)
rf <- fitted(rc)
table(Votes$Class, rf$cl)
## Not run: 
### large example from paper
data("Mushroom")
x <- as.dummy(Mushroom[-1])
rc <- rockCluster(x[sample(dim(x)[1],1000),], n=10, theta=0.8)
print(rc)
rp <- predict(rc, x)
table(Mushroom$class, rp$cl)

## End(Not run)
### real valued example
gdist <- function(x, y=NULL) 1-exp(-dist(x, y)^2)
xr <- matrix(rnorm(200, sd=0.6)+rep(rep(c(1,-1),each=50),2), ncol=2)
rcr <- rockCluster(xr, n=2, theta=0.75, fun=gdist, funArgs=NULL)
print(rcr)

cba documentation built on Sept. 11, 2024, 9:32 p.m.

cba index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cba
Clustering for Business Analytics

rockCluster: Rock Clustering
In cba: Clustering for Business Analytics

Rock Clustering

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to rockCluster in cba...

R Package Documentation

Browse R Packages

We want your feedback!

cba Clustering for Business Analytics

rockCluster: Rock Clustering In cba: Clustering for Business Analytics

Rock Clustering

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to rockCluster in cba...

R Package Documentation

Browse R Packages

We want your feedback!

cba
Clustering for Business Analytics

rockCluster: Rock Clustering
In cba: Clustering for Business Analytics