ImbTreeEntropyInter: Fit an Interactive Decision Trees
In KrzyGajow/ImbTreeEntropy: Software to build Decision Trees for imbalanced data

Usage Arguments Value See Also Examples

View source: R/ImbTreeEntropyInter.R

ImbTreeEntropyInter( Y_name, X_names, data, depth = 5, min_obs = 5, type = "Shannon", entropy_par = 1, 
                     cp = 0, n_cores = 1, weights = NULL, cost = NULL, class_th = "equal", 
                     overfit ="leafcut", cf = 0.25, amb_prob = 1, top_split = 2, var_lev = T, 
                     amb_class = NULL, amb_class_freq = NULL, tree_path = getwd() )

`Y_name`	Name of the target variable. Character vector of one element.
`X_names`	Attribute names used for target (Y_name) modelling. Character vector of many elements.
`data`	Data.frame in which to interpret the parameters Yname and Xnames.
`depth`	Set the maximum depth of any node of the final tree, with the root node counted as depth 0. Numeric vector of one element which is greater or equal to 0.
`min_obs`	The minimum number of observations that must exist in any terminal node (leaf). Numeric vector of one element which is greater or equal to 1.
`type`	Method used for learning. Character vector of one element with one of the: "Shannon", "Renyi", "Tsallis", "Sharma-Mittal", ""Sharma-Taneja", "Kapur".
`entropy_par`	Numeric vector specifying parameters for the following entropies: "Renyi", "Tsallis", "Sharma-Mittal", "Sharma-Taneja", "Kapur". For "Renyi", "Tsallis" it is one-element vector with q-value. For "Sharma-Mittal" or "Sharma-Taneja" and "Kapura" it is two-element vector with either q-value and r-value or alpha-value and beta-value, respectively.
`cp`	Complexity parameter, i.e. any split that does not decrease the overall lack of fit by a factor of cp is not attempted. It refers to miss-classification error. If cost or weights are specified aforementioned measure takes these parameter into account. Numeric vector of one element which is greater or equal to 0.
`n_cores`	Number of cores used for parallel processing. Numeric vector of one element which is greater or equal to 1.
`weights`	Numeric vector of cases weights. It should have as many elements as the number of observation in the data.frame passed to the data parameter.
`cost`	Matrix of costs associated with the possible errors. The matrix should have k columns and rows, where k is the number of class levels. Rows contain true classes while columns contain predicted classes. Rows and columns names should take all possible categories (labels) of the target variable.
`class_th`	Method used for determining thresholds based on which the final class for each node is derived. If cost is specified it can take one of the following: "theoretical", "tuned", otherwise it takes "equal". Character vector of one element.
`overfit`	Character vector of one element with one of the: "none",”leafcut”, "prune", "avoid" specifying which method overcoming overfitting should be used. ”leafcut” method is used when the full tree is built, it reduces the subtree when both siblings choose the same class label. "avoid" method is incorporated during the recursive partitioning, it prohibit the split when both sibling chose the same class. “prune” method employs pessimistic error pruning procedure, it should be specified along with the cf parameter.
`cf`	Numeric vector of one element with the number in (0, 1) for the optional pessimistic-error-rate-based pruning step.
`amb_prob`	Ambiguity threshold for the difference between the highest class probability and the second highest class probability per node, below which the expert has to make a decision regarding the future tree structure. Logical vector with one element. It works when the amb_class parameter is NULL.
`top_split`	Number of best splits, i.e. final trees structure to be presented. Splits are sorted in descending order according to the information gain. Numeric vector with one element.
`var_lev`	Decision indicating whether possible best splits are derived on the attribute level (higher) or on the split point for each attribute (lower). “TRUE” means that the expert gets the best splits, one for each variable. "FALSE” means the best splits at all where it might happen that the expert receives top_split splits coming from only one variable. Logical vector with one element.
`amb_class`	Labels of class for which the expert will make a decision during the learning. Character vector of many elements (from 1 up to number of classes). Should have the same number of elements as vector passed to the amb_class_freq parameter.
`amb_class_freq`	Classes frequencies per node above which the expert will make a decision. Numeric vector of many elements (from 1 up to number of classes). Should have the same number of elements as vector passed to the amb_class parameter.
`tree_path`	Path to the folder where the proposed trees created during the interactive learning will be stored. *.txt file with the tree structure is iteratively updated. Character vector with one element.

A fitted model/object of class Node, R6. See. data.tree.

ImbTreeEntropy, ImbTreeEntropyInter, PredictTree, PrintTree, PrintTreeInter, ExtractRules

library("ImbTreeEntropy")
data(iris)
# Choosing sequence: 4, 3, 2, 1, 1
Tree <- ImbTreeEntropyInter(Y_name = "Species", 
                            X_names = colnames(iris)[-ncol(iris)], 
                            data = iris) 
PrintTreeInter(Tree)