qtclust: Stochastic QT Clustering

Description Usage Arguments Details Value Author(s) References Examples

View source: R/qtclust.R

Description

Perform stochastic QT clustering on a data matrix.

Usage

1
2
qtclust(x, radius, family = kccaFamily("kmeans"), control = NULL,
        save.data=FALSE, kcca=FALSE)

Arguments

x

A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

radius

Maximum radius of clusters.

family

Object of class "kccaFamily" specifying the distance measure to be used.

control

An object of class "flexclustControl" specifying the minimum number of observations per cluster (min.size), and trials per iteration (ntry, see details below).

.

save.data

Save a copy of x in the return object?

kcca

Run kcca after the QT cluster algorithm has converged?

Details

This function implements a variation of the QT clustering algorithm by Heyer et al. (1999), see Scharl and Leisch (2006). The main difference is that in each iteration not all possible cluster start points are considered, but only a random sample of size [email protected]. We also consider only points as initial centers where at least one other point is within a circle with radius radius. In most cases the resulting solutions are almost the same at a considerable speed increase, in some cases even better solutions are obtained than with the original algorithm. If [email protected] is set to the size of the data set, an algorithm similar to the original algorithm as proposed by Heyer et al. (1999) is obtained.

Value

Function qtclust by default returns objects of class "kccasimple". If argument kcca is TRUE, function kcca() is run afterwards (initialized on the QT cluster solution). Data points not clustered by the QT cluster algorithm are omitted from the kcca() iterations, but filled back into the return object. All plot methods defined for objects of class "kcca" can be used.

Author(s)

Friedrich Leisch

References

Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106–1115.

Theresa Scharl and Friedrich Leisch. The stochastic QT-clust algorithm: evaluation of stability and variance on time-course microarray data. In Alfredo Rizzi and Maurizio Vichi, editors, Compstat 2006 – Proceedings in Computational Statistics, pages 1015-1022. Physica Verlag, Heidelberg, Germany, 2006.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
x <- matrix(10*runif(1000), ncol=2)

## maximum distrance of point to cluster center is 3
cl1 <- qtclust(x, radius=3)

## maximum distrance of point to cluster center is 1
## -> more clusters, longer runtime
cl2 <- qtclust(x, radius=1)

opar <- par(c("mfrow","mar"))
par(mfrow=c(2,1), mar=c(2.1,2.1,1,1))
plot(x, col=predict(cl1), xlab="", ylab="")
plot(x, col=predict(cl2), xlab="", ylab="")
par(opar)

Example output

Loading required package: grid
Loading required package: lattice
Loading required package: modeltools
Loading required package: stats4

flexclust documentation built on May 2, 2019, 10:59 a.m.