dlda: Classification with Wilma's Clusters

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/wilma-utils.R

Description

The four functions nnr (nearest neighbor rule), dlda (diagonal linear discriminant analysis), logreg (logistic regression) and aggtrees (aggregated trees) are used for binary classification with the cluster representatives of Wilma's output.

Usage

1
2
3
4
dlda    (xlearn, xtest, ylearn)
nnr     (xlearn, xtest, ylearn)
logreg  (xlearn, xtest, ylearn)
aggtrees(xlearn, xtest, ylearn)

Arguments

xlearn

Numeric matrix of explanatory variables (q variables in columns, n cases in rows), containing the learning or training data. Typically, these are the (gene) cluster representatives of Wilma's output.

xtest

A numeric matrix of explanatory variables (q variables in columns, m cases in rows), containing the test or validation data. Typically, these are the fitted (gene) cluster representatives of Wilma's output for the training data, obtained from predict.wilma.

ylearn

Numeric vector of length n containing the class labels for the training observations. These labels have to be coded by 0 and 1.

Details

nnr implements the 1-nearest-neighbor-rule with Euclidean distance function. dlda is linear discriminant analysis, using the restriction that the covariance matrix is diagonal with equal variance for all predictors. logreg is default logistic regression. aggtrees fits a default stump (a classification tree with two terminal nodes) by rpart for every predictor variable and uses majority voting to determine the final classifier.

Value

Numeric vector of length m, containing the predicted class labels for the test observations. The class labels are coded by 0 and 1.

Author(s)

Marcel Dettling, [email protected]

References

Marcel Dettling (2002) Supervised Clustering of Genes, see http://stat.ethz.ch/~dettling/supercluster.html

Marcel Dettling and Peter B<c3><bc>hlmann (2002). Supervised Clustering of Genes. Genome Biology, 3(12): research0069.1-0069.15.

See Also

wilma

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Generating random learning data: 20 observations and 10 variables (clusters)
set.seed(342)
xlearn <- matrix(rnorm(200), nrow = 20, ncol = 10)

## Generating random test data: 8 observations and 10 variables(clusters)
xtest  <- matrix(rnorm(80),  nrow = 8,  ncol = 10)

## Generating random class labels for the learning data
ylearn <- as.numeric(runif(20)>0.5)

## Predicting the class labels for the test data
nnr(xlearn, xtest, ylearn)
dlda(xlearn, xtest, ylearn)
logreg(xlearn, xtest, ylearn)
aggtrees(xlearn, xtest, ylearn)

Example output

[1] 0 0 0 0 0 1 0 0
[1] 0 0 0 0 1 1 1 0
[1] 1 0 0 0 1 1 1 0
[1] 1 0 0 0 1 1 1 1

supclust documentation built on May 29, 2017, 9:19 a.m.