Estimation of misclassification errors (generalisation errors) based on statistical and various machine learning methods

Share:

Description

Estimates misclassification errors (generalisation errors), sensitivity and specificity using cross-validation, bootstrap and 632plus bias corrected bootstrap methods based on Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour methods.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## S3 method for class 'data.frame'
classificationError(
          formula,
          data, 
          method=c("RF","SVM","LDA","KNN"), 
          errorType = c("cv", "boot", "six32plus"),
	  senSpec=TRUE,
          negLevLowest=TRUE,
	  na.action=na.omit, 
          control=control.errorest(k=NROW(na.action(data)),nboot=100),
          ...)

Arguments

formula

A formula of the form lhs ~ rhs relating response (class) variable and the explanatory variables. See lm for more detail.

data

A data frame containing the response (class membership) variable and the explanatory variables in the formula.

method

A character vector of length 1 to 4 representing the classification methods to be used. Can be one or more of "RF" (Random Forest), "SVM" (Support Vector Machines), "LDA" (Linear Discriminant Analysis) and "KNN" (k-Nearest Neighbour). Defaults to all four methods.

errorType

A character vector of length 1 to 3 representing the type of estimators to be used for computing misclassification errors. Can be one or more of the "cv" (cross-validation), "boot" (bootstrap) and "632plus" (632plus bias corrected bootstrap) estimators. Defaults to all three estimators.

senSpec

Logical. Should sensitivity and specificity (for cross-validation estimator only) be computed? Defaults to TRUE.

negLevLowest

Logical. Is the lowest of the ordered levels of the class variable represnts the negative control? Defaults to TRUE.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

control

Control parameters of the the function errorest.

...

additional parameters to method.

Details

In the current version of the package, estimation of sensitivity and specificity is limited to cross-validation estimator only. For LDA sample size must be greater than the number of explanatory variables to avoid singularity. The function classificationError does not check if this is satisfied, but the underlying function lda produces warnings if this condition is violated.

Value

Returns an object of class classificationError with components

call

The call of the classificationError function.

errorRate

A length(errorType) by length(method) matrix of classification errors.

rocData

A 2 by length(method) matrix of sensitivities (first row) and specificities (second row).

Author(s)

Mizanur Khondoker, Till Bachmann, Peter Ghazal
Maintainer: Mizanur Khondoker mizanur.khondoker@gmail.com.

References

Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.

Breiman, L. (2001). Random Forests, Machine Learning 45(1), 5–32.

Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.

Efron, B. and Tibshirani, R. (1997). Improvements on Cross-Validation: The .632+ Bootstrap Estimator. Journal of the American Statistical Association 92(438), 548–560.

See Also

simData

Examples

1
2
mydata<-simData(nTrain=30,nBiom=3)$data
classificationError(formula=class~., data=mydata)