tree.control | R Documentation |
Configure the fitting process of individual decision trees.
tree.control( nodesize = 10, split_criterion = "gini", alpha = 0.05, cp = 0.001, smoothing = "none", mtry = "none", covariable = "final_4pl" )
nodesize |
Minimum number of samples contained in a terminal node. This parameter ensures that enough samples are available for performing predictions which includes fitting 4pL models. |
split_criterion |
Splitting criterion for deciding
when and how to split. The default is "gini"/"mse" which
utilizes the Gini splitting criterion for binary risk
estimation tasks and the mean squared error as impurity
measure in regression tasks. Alternatively, "4pl" can be
used if a quantitative covariable is supplied and
the parameter |
alpha |
Significance threshold for the likelihood ratio
tests when using |
cp |
Complexity parameter. This parameter determines
by which amount the impurity has to be reduced to further
split a node. Here, the total tree impurity is considered.
See details for a concrete formula. Only used if
|
smoothing |
Shall the leaf predictions for risk estimation be smoothed? "laplace" yields Laplace smoothing. The default is "none" which does not employ smoothing.#' |
mtry |
Shall the tree fitting process be randomized as in random forests? Currently, only "sqrt" for using √{p} random predictors at each node for splitting and "none" (default) for fitting conventional decision trees are supported. |
covariable |
How shall optional quantitative covariables be handled? "constant" ignores them. Alternatively, they can be considered as splitting variables ("_split"), used for fitting 4pL models in each leaf ("_4pl"), or used for fitting linear models in each leaf ("_linear"). If either splitting or model fitting is chosen, one should state if this should be handled over the whole search ("full_", computationally expensive) or just the final trees ("final_"). Thus, "final_4pl" would lead to fitting 4pL in each leaf but only for the final fitting of trees. |
For the Gini or MSE splitting criterion, if any considered split s leads to
P(t) \cdot Δ I(s,t) > \texttt{cp}
for a node t, the empirical node probability
P(t) and the impurity reduction Δ I(s,t),
then the node is further splitted. If not, the node is
declared as a leaf.
For continuous outcomes, cp
will be scaled by the
empirical variance of y
to ensure the right scaling,
i.e., cp <- cp * var(y)
. Since the impurity measure
for continuous outcomes is the mean squared error, this can
be interpreted as controlling the minimum reduction of the
normalized mean squared error (NRMSE to the power of two).
If one chooses the 4pL or linear splitting criterion, likelihood ratio tests testing the alternative of better fitting individual models are employed. The corresponding test statistic asymptotically follows a χ^2 distribution where the degrees of freedom are given by the difference in the number of model parameters, i.e., leading to 2 \cdot 4 - 4 = 4 degrees of freedom in the case of 4pL models and to 2 \cdot 2 - 2 = 2 degrees of freedom in the case of linear models.
For binary outcomes, choosing to fit linear models for evaluating the splits or for modeling the leaves actually leads to fitting LDA (linear discriminant analysis) models.
An object of class tree.control
which is a list
of all necessary tree parameters.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.