Evaluation for classification trees by cross-validation

1 2 |

`X` |
standardized complete X data matrix (training and test data) |

`grp` |
factor with groups for complete data (training and test data) |

`train` |
row indices of X indicating training data objects |

`kfold` |
number of folds for cross-validation |

`cp` |
range for tree complexity parameter, see |

`plotit` |
if TRUE a plot will be generated |

`legend` |
if TRUE a legend will be added to the plot |

`legpos` |
positioning of the legend in the plot |

`...` |
additional plot arguments |

The data are split into a calibration and a test data set (provided by "train"). Within the calibration set "kfold"-fold CV is performed by applying the classification method to "kfold"-1 parts and evaluation for the last part. The misclassification error is then computed for the training data, for the CV test data (CV error) and for the test data.

`trainerr` |
training error rate |

`testerr` |
test error rate |

`cvMean` |
mean of CV errors |

`cvSe` |
standard error of CV errors |

`cverr` |
all errors from CV |

`cp` |
range for tree complexity parameter, taken from input |

Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```
data(fgl,package="MASS")
grp=fgl$type
X=scale(fgl[,1:9])
k=length(unique(grp))
dat=data.frame(grp,X)
n=nrow(X)
ntrain=round(n*2/3)
require(rpart)
set.seed(123)
train=sample(1:n,ntrain)
par(mar=c(4,4,3,1))
restree=treeEval(X,grp,train,cp=c(0.01,0.02:0.05,0.1,0.15,0.2:0.5,1))
title("Classification trees")
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.