Classification tree evaluation by CV

Description

Evaluation for classification trees by cross-validation

Usage

1
2
treeEval(X, grp, train, kfold = 10, cp = seq(0.01, 0.1, by = 0.01), plotit = TRUE, 
   legend = TRUE, legpos = "bottomright", ...)

Arguments

X

standardized complete X data matrix (training and test data)

grp

factor with groups for complete data (training and test data)

train

row indices of X indicating training data objects

kfold

number of folds for cross-validation

cp

range for tree complexity parameter, see rpart

plotit

if TRUE a plot will be generated

legend

if TRUE a legend will be added to the plot

legpos

positioning of the legend in the plot

...

additional plot arguments

Details

The data are split into a calibration and a test data set (provided by "train"). Within the calibration set "kfold"-fold CV is performed by applying the classification method to "kfold"-1 parts and evaluation for the last part. The misclassification error is then computed for the training data, for the CV test data (CV error) and for the test data.

Value

trainerr

training error rate

testerr

test error rate

cvMean

mean of CV errors

cvSe

standard error of CV errors

cverr

all errors from CV

cp

range for tree complexity parameter, taken from input

Author(s)

Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

References

K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.

See Also

rpart

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(fgl,package="MASS")
grp=fgl$type
X=scale(fgl[,1:9])
k=length(unique(grp))
dat=data.frame(grp,X)
n=nrow(X)
ntrain=round(n*2/3)
require(rpart)
set.seed(123)
train=sample(1:n,ntrain)
par(mar=c(4,4,3,1))
restree=treeEval(X,grp,train,cp=c(0.01,0.02:0.05,0.1,0.15,0.2:0.5,1))
title("Classification trees")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.