cv.prune | R Documentation |
Using a fitted logicDT
model, its logic decision tree can be
optimally (post-)pruned utilizing k-fold cross-validation.
cv.prune( model, nfolds = 10, scoring_rule = "deviance", choose = "1se", simplify = TRUE )
model |
A fitted |
nfolds |
Number of cross-validation folds |
scoring_rule |
The scoring rule for evaluating the cross-validation
error and its standard error. For classification tasks, |
choose |
Model selection scheme. If the model that minimizes the
cross-validation error should be chosen, |
simplify |
Should the pruned model be simplified with regard to the input terms, i.e., should terms that are no longer in the tree contained be removed from the model? |
Similar to Breiman et al. (1984), we implement post-pruning by first computing the optimal pruning path and then using cross-validation for identifying the best generalizing model.
In order to handle continuous covariables with fitted regression models in
each leaf, similar to the likelihood-ratio splitting criterion in
logicDT
, we propose using the log-likelihood as the impurity
criterion in this case for computing the pruning path.
In particular, for each node t, the weighted node impurity
p(t)i(t) has to be calculated and the inequality
Δ i(s,t) := i(t) - p(t_L | t)i(t_L) - p(t_R | t)i(t_R) ≥q 0
has to be fulfilled for each possible split s splitting t into two subnodes t_L and t_R. Here, i(t) describes the impurity of a node t, p(t) the proportion of data points falling into t, and p(t' | t) the proportion of data points falling from t into t'. Since the regression models are fitted using maximum likelihood, the maximum likelihood criterion fulfills this property and can also be seen as an extension of the entropy impurity criterion in the case of classification or an extension of the MSE impurity criterion in the case of regression.
The default model selection is done by choosing the most parsimonious model that yields a cross-validation error in the range of \mathrm{CV}_{\min} + \mathrm{SE}_{\min} for the minimal cross-validation error \mathrm{CV}_{\min} and its corresponding standard error \mathrm{SE}_{\min}. For a more robust standard error estimation, the scores are calculated per training observation such that the AUC is no longer an appropriate choice and the deviance or the Brier score should be used in the case of classification.
A list containing
|
The new |
|
A data frame containing the penalties, the cross-validation scores and the corresponding standard errors |
|
The ideal penalty value |
Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. (1984). Classification and Regression Trees. CRC Press. doi: 10.1201/9781315139470
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.