ImbTreeEntropy: Fit a Decision Trees

Usage Arguments Value See Also Examples

View source: R/ImbTreeEntropy.R

Usage

1
2
3
ImbTreeEntropy( Y_name, X_names, data, depth = 5, min_obs = 5, type = "Shannon", entropy_par = 1, 
                cp = 0, n_cores = 1, weights = NULL, cost = NULL, class_th = "equal", 
                overfit = "leafcut", cf = 0.25 )

Arguments

Y_name

Name of the target variable. Character vector of one element.

X_names

Attribute names used for target (Y_name) modelling. Character vector of many elements.

data

Data.frame in which to interpret the parameters Yname and Xnames.

depth

Set the maximum depth of any node of the final tree, with the root node counted as depth 0. Numeric vector of one element which is greater or equal to 0.

min_obs

The minimum number of observations that must exist in any terminal node (leaf). Numeric vector of one element which is greater or equal to 1.

type

Method used for learning. Character vector of one element with one of the: "Shannon", "Renyi", "Tsallis", "Sharma-Mittal", ""Sharma-Taneja", "Kapur".

entropy_par

Numeric vector specifying parameters for the following entropies: "Renyi", "Tsallis", "Sharma-Mittal", "Sharma-Taneja", "Kapur". For "Renyi", "Tsallis" it is one-element vector with q-value. For "Sharma-Mittal" or "Sharma-Taneja" and "Kapura" it is two-element vector with either q-value and r-value or alpha-value and beta-value, respectively.

cp

Complexity parameter, i.e. any split that does not decrease the overall lack of fit by a factor of cp is not attempted. It refers to miss-classification error. If cost or weights are specified aforementioned measure takes these parameter into account. Numeric vector of one element which is greater or equal to 0.

n_cores

Number of cores used for parallel processing. Numeric vector of one element which is greater or equal to 1.

weights

Numeric vector of cases weights. It should have as many elements as the number of observation in the data.frame passed to the data parameter.

cost

Matrix of costs associated with the possible errors. The matrix should have k columns and rows, where k is the number of class levels. Rows contain true classes while columns contain predicted classes. Rows and columns names should take all possible categories (labels) of the target variable.

class_th

Method used for determining thresholds based on which the final class for each node is derived. If cost is specified it can take one of the following: "theoretical", "tuned", otherwise it takes "equal". Character vector of one element.

overfit

Character vector of one element with one of the: "none",”leafcut”, "prune", "avoid" specifying which method overcoming overfitting should be used. ”leafcut” method is used when the full tree is built, it reduces the subtree when both siblings choose the same class label. "avoid" method is incorporated during the recursive partitioning, it prohibit the split when both sibling chose the same class. “prune” method employs pessimistic error pruning procedure, it should be specified along with the cf parameter.

cf

Numeric vector of one element with the number in (0, 1) for the optional pessimistic-error-rate-based pruning step.

Value

A fitted model/object of class Node, R6. See. data.tree.

See Also

ImbTreeEntropy, ImbTreeEntropyInter, PredictTree, PrintTree, PrintTreeInter, ExtractRules

Examples

1
2
3
4
5
6
library("ImbTreeEntropy")
data(iris)
Tree <- ImbTreeEntropy(Y_name = "Species", 
                       X_names = colnames(iris)[-ncol(iris)], 
                       data = iris)
PrintTree(Tree)

KrzyGajow/ImbTreeEntropy documentation built on Dec. 31, 2020, 2:13 p.m.