evaluation: Evaluation of classifiers
In CMA: Synthesis of microarray-based classification

Description Usage Arguments Value Author(s) References See Also Examples

The performance of classifiers can be evaluted by six different measures and two different schemes that are described more precisely below.
For S4 method information, s. evaluation-methods.

1
2

evaluation(clresult, cltrain = NULL, cost = NULL, y = NULL, measure = c("misclassification", "sensitivity", "specificity", "average probability", "brier score", "auc", "0.632", "0.632+"),
                     scheme = c("iterationwise", "observationwise", "classwise"))

`clresult`	A list of objects of class `cloutput` or `clvarseloutput`
`cltrain`	An object of class `cloutput` in which the whole dataset was used as learning set. Only used if `method = "0.632"` or `method = "0.632+"` in order to obtain an estimation for the resubsitution error rate.
`cost`	An optional cost matrix used if `measure = "misclassification"`. If it is not specified (default), the cost is the usual indicator loss. Otherwise, entry `i,j` of `cost` quantifies the loss when the true class is class `i-1` and the predicted class is `j-1`, provided the conventional coding `0,...,K-1` in the case of `K` classes is used. Usually, the matrix contains only non-negative entries with zeros on the diagonal, but this is not obligatory. Make sure that the dimension of the matrix matches the number of classes.
`y`	A vector containing the true class labels. Only needed if `scheme = "classwise"`.
`measure`	Peformance measure to be used: `"misclassification"` The missclassifcation rate. `"sensitivity"` The sensitivity or 1-false negative rate. Can only be computed for binary classifcation. `"specificity"` The specificity or 1-false positive rate. Can only be computed for binary classification. `"average probability"` The average probability assigned to the correct class. Requirement is that the used classifier provides probability estimations. The optimum performance is 1. `"brier score"` The Brier Score is generally defined as `<sum over all observation i> <sum over all classes k> (I(y_i=k)-P(k))^2`, with `I()` denoting the indicator function and `P(k)` the estimated probability for class `k`. The optimum performance is 0. `"auc"` The Area under the Curve (AUC) belonging to the empirical ROC curve computed from the estimated probabilities and the true class labels. Can only be computed for binary classification and if `"scheme = iterationwise"`, s. below. S. also `roc,cloutput-method`. `"0.632"` The 0.632 estimator (s. reference) for the misclassification rate (applied iteration- or) observationwise, if bootstrap learning sets have been used. Note that `cltrain` must be provided. `"0.632+"` The 0.632+ estimator (s. reference) for the misclassification rate (applied iteration- or) observationwise, if bootstrap learning sets have been used. Note that `cltrain` must be provided.
`scheme`	`"iterationwise"` The performance measures listed above are computed for each different iteration, i.e. each different `learningset` `"observationwise"` The performance measures listed above (except for `"auc"`) are computed separately for each observation classified one or several times, depending on the `learningset` scheme. `"classwise"` The performance measures (exceptions: `"auc", "0.632", "0.632+"`) are computed separately for each class, averaged over both iterations and observations.

An object of class evaloutput.

Martin Slawski ms@cs.uni-sb.de

Anne-Laure Boulesteix boulesteix@ibe.med.uni-muenchen.de

Christoph Bernau bernau@ibe.med.uni-muenchen.de

Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: The .632+ bootstrap method.
Journal of the American Statistical Association, 92, 548-560.

Slawski, M. Daumer, M. Boulesteix, A.-L. (2008) CMA - A comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9: 439

evaloutput, classification, compare

### simple linear discriminant analysis example using bootstrap datasets:
### datasets:
data(golub)
golubY <- golub[,1]
### extract gene expression from first 10 genes
golubX <- as.matrix(golub[,2:11])
### generate 25 bootstrap datasets
set.seed(333)
bootds <- GenerateLearningsets(y = golubY, method = "bootstrap", ntrain = 30, niter = 10, strat = TRUE)
### run classification()
ldalist <- classification(X=golubX, y=golubY, learningsets = bootds, classifier=ldaCMA)
### Evaluation:
eval_iter <- evaluation(ldalist, scheme = "iter")
eval_obs <- evaluation(ldalist, scheme = "obs")
show(eval_iter)
show(eval_obs)
summary(eval_iter)
summary(eval_obs)
### auc with boxplot
eval_auc <- evaluation(ldalist, scheme = "iter", measure = "auc")
boxplot(eval_auc)
### which observations have often been misclassified ?
obsinfo(eval_obs, threshold = 0.75)