Control for C5.0 Models

Share:

Description

Various parameters that control aspects of the C5.0 fit.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
C5.0Control(subset = TRUE, 
            bands = 0, 
            winnow = FALSE, 
            noGlobalPruning = FALSE, 
            CF = 0.25, 
            minCases = 2, 
            fuzzyThreshold = FALSE, 
            sample = 0, 
            seed = sample.int(4096, size = 1) - 1L,  
            earlyStopping = TRUE,
            label = "outcome")

Arguments

subset

A logical: should the model evaluate groups of discrete predictors for splits? Note: the C5.0 command line version defaults this parameter to FALSE, meaning no attempted gropings will be evaluated during the tree growing stage.

bands

An integer between 2 and 1000. If TRUE, the model orders the rules by their affect on the error rate and groups the rules into the specified number of bands. This modifies the output so that the effect on the error rate can be seen for the groups of rules within a band. If this options is selected and rules = FALSE, a warning is issued and rules is changed to TRUE.

winnow

A logical: should predictor winnowing (i.e feature selection) be used?

noGlobalPruning

A logical to toggle whether the final, global pruning step to simplify the tree.

CF

A number in (0, 1) for the confidence factor.

minCases

an integer for the smallest number of samples that must be put in at least two of the splits.

fuzzyThreshold

A logical toggle to evaluate possible advanced splits of the data. See Quinlan (1993) for details and examples.

sample

A value between (0, .999) that specifies the random proportion of the data should be used to train the model. By default, all the samples are used for model training. Samples not used for training are used to evaluate the accuracy of the model in the printed output.

seed

An integer for the random number seed within the C code.

earlyStopping

A logical to toggle whether the internal method for stopping boosting should be used.

label

A character label for the outcome used in the output.

Value

A list of options.

Author(s)

Original GPL C code by Ross Quinlan, R code and modifications to C by Max Kuhn, Steve Weston and Nathan Coulter

References

Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html

See Also

C5.0, predict.C5.0, summary.C5.0, C5imp

Examples

1
2
3
4
5
data(churn)

treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn,
                  control = C5.0Control(winnow = TRUE))
summary(treeModel)