# Supervised Clustering of Predictor Variables

### Description

Performs supervised clustering of predictor variables for large (microarray gene expression) datasets. Works in a greedy forward strategy and optimizes a combination of the Wilcoxon and Margin statistics for finding the clusters.

### Usage

1 |

### Arguments

`x` |
Numeric matrix of explanatory variables ( |

`y` |
Numeric vector of length |

`noc` |
Integer, the number of clusters that should be searched for on the data. |

`genes` |
Defaults to |

`flip` |
Logical, defaults to |

`once.per.clust` |
Logical, defaults to |

`trace` |
Integer >= 0; when positive, the output of the internal
loops is provided; |

### Value

`wilma`

returns an object of class "wilma". The functions
`print`

and `summary`

are used to obtain an overview of the
clusters that have been found. The function `plot`

yields a
two-dimensional projection into the space of the first two clusters
that `wilma`

found. The generic function `fitted`

returns
the fitted values, these are the cluster representatives. Finally,
`predict`

is used for classifying test data on the basis of
Wilma's cluster with either the nearest-neighbor-rule, diagonal linear
discriminant analysis, logistic regression or aggregated trees.

An object of class "wilma" is a list containing:

`clist` |
A list of length |

`steps` |
Numerical vector of length |

`y` |
Numeric vector of length |

`x.means` |
A list of length |

`noc` |
Integer, the number of clusters that has been searched for on the data. |

`signs` |
Numerical vector of length |

### Author(s)

Marcel Dettling, dettling@stat.math.ethz.ch

### References

Marcel Dettling (2002)
*Supervised Clustering of Genes*, see
http://stat.ethz.ch/~dettling/supercluster.html

Marcel Dettling and Peter Bühlmann (2002).
Supervised Clustering of Genes.
*Genome Biology*, **3**(12): research0069.1-0069.15.

Marcel Dettling and Peter Bühlmann (2004).
Finding Predictive Gene Groups from Microarray Data.
To appear in the *Journal of Multivariate Analysis*.

### See Also

`score`

, `margin`

, and for a newer
methodology, `pelora`

.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ```
## Working with a "real" microarray dataset
data(leukemia, package="supclust")
## Generating random test data: 3 observations and 250 variables (genes)
set.seed(724)
xN <- matrix(rnorm(750), nrow = 3, ncol = 250)
## Fitting Wilma
fit <- wilma(leukemia.x, leukemia.y, noc = 3, trace = 1)
## Working with the output
fit
summary(fit)
plot(fit)
fitted(fit)
## Fitted values and class predictions for the training data
predict(fit, type = "cla")
predict(fit, type = "fitt")
## Predicting fitted values and class labels for test data
predict(fit, newdata = xN)
predict(fit, newdata = xN, type = "cla", classifier = "nnr", noc = c(1,2,3))
predict(fit, newdata = xN, type = "cla", classifier = "dlda", noc = c(1,3))
predict(fit, newdata = xN, type = "cla", classifier = "logreg")
predict(fit, newdata = xN, type = "cla", classifier = "aggtrees")
``` |