decision_tree | R Documentation |
This function creates a decision tree based of an example dataset, calculating the best classifier possible in each step. Only creates perfect divisions, this means, if the rule doesn't create a classified group, it is not considered. It is specifically designed for categorical values. Continues values are not recommended as they will be treated as categorical ones.
decision_tree(
data,
classy,
m,
method = "entropy",
learn = FALSE,
waiting = TRUE
)
data |
A data frame with already classified observations. Each column represents a parameter of the value. Each row is a different observation. The column names in the parameter "data" must not contain the sequence of characters " or ". As this is supposed to be a binary decision rules generator and not a binary decision tree generator, no tree structures are used, except for the information gain formulas. |
classy |
Name of the column we want the data to be classified by. the set of rules obtained will be calculated according to this. |
m |
Maximum numbers of child nodes each node can have. |
method |
The definition of Gain. It must be one of
|
learn |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
If data
is not perfectly classifiable, the code will not finish.
Available information gain methods are:
The formula to calculate the entropy
works as follows:p_{i} = -\sum{f_{i} p_{i} \cdot \log2 p_{i}}
The formula to calculate gini
works as follows:p_{i} = 1 -\sum{f_{i} p_{i}^{2}}
The formula to calculate error
works as follows:p_{i} = 1 -\max{(f_{i} p_{i}})
Once the impurity is calculated, the information gain is calculated as follows:
IG = I_{father} - \sum{\frac{count(sonvalues)}{count(fathervalues)} \cdot I_{son}}
Structure of the tree. List with a list per tree level. Each of these contains a list per level node, each of these contains a list with the node's filtered data, the node's id, the father's node id, the height that node is at, the variable it filters by, the value that variable is filtered by and the information gain of the division
Víctor Amador Padilla, victor.amador@edu.uah.es
# example code
decision_tree(db3, "VehicleType", 5, "entropy", learn = TRUE, waiting = FALSE)
decision_tree(db2, "VehicleType", 4, "gini")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.