ctree_control  R Documentation 
Various parameters that control aspects of the ‘ctree’ fit.
ctree_control(teststat = c("quadratic", "maximum"), splitstat = c("quadratic", "maximum"), splittest = FALSE, testtype = c("Bonferroni", "MonteCarlo", "Univariate", "Teststatistic"), pargs = GenzBretz(), nmax = c(yx = Inf, z = Inf), alpha = 0.05, mincriterion = 1  alpha, logmincriterion = log(mincriterion), minsplit = 20L, minbucket = 7L, minprob = 0.01, stump = FALSE, maxvar = Inf, lookahead = FALSE, MIA = FALSE, nresample = 9999L, tol = sqrt(.Machine$double.eps),maxsurrogate = 0L, numsurrogate = FALSE, mtry = Inf, maxdepth = Inf, multiway = FALSE, splittry = 2L, intersplit = FALSE, majority = FALSE, caseweights = TRUE, applyfun = NULL, cores = NULL, saveinfo = TRUE, update = NULL, splitflavour = c("ctree", "exhaustive"))
teststat 
a character specifying the type of the test statistic to be applied for variable selection. 
splitstat 
a character specifying the type of the test statistic
to be applied for splitpoint selection. Prior to
version 1.20, 
splittest 
a logical changing linear (the default 
testtype 
a character specifying how to compute the distribution of
the test statistic. The first three options refer to
pvalues as criterion, 
pargs 
control parameters for the computation of multivariate
normal probabilities, see 
nmax 
an integer of length two defining the number of bins each variable
(in the response 
alpha 
a double, the significance level for variable selection. 
mincriterion 
the value of the test statistic or 1  pvalue that must be exceeded in order to implement a split. 
logmincriterion 
the value of the test statistic or 1  pvalue that must be exceeded in order to implement a split on the logscale. 
minsplit 
the minimum sum of weights in a node in order to be considered for splitting. 
minbucket 
the minimum sum of weights in a terminal node. 
minprob 
proportion of observations needed to establish a terminal node. 
stump 
a logical determining whether a stump (a tree with a maximum of three nodes only) is to be computed. 
maxvar 
maximum number of variables the tree is allowed to split in. 
lookahead 
a logical determining whether a split is implemented only after checking if tests in both daughter nodes can be performed. 
MIA 
a logical determining the treatment of 
nresample 
number of permutations for 
tol 
tolerance for zero variances. 
maxsurrogate 
number of surrogate splits to evaluate. 
numsurrogate 
a logical for backwardcompatibility with party. If

mtry 
number of input variables randomly sampled as candidates
at each node for random forest like algorithms. The default

maxdepth 
maximum depth of the tree. The default 
multiway 
a logical indicating if multiway splits for all factor levels are implemented for unordered factors. 
splittry 
number of variables that are inspected for admissible splits if the best split doesn't meet the sample size constraints. 
intersplit 
a logical indicating if splits in numeric variables
are simply 
majority 
if 
caseweights 
a logical interpreting 
applyfun 
an optional 
cores 
numeric. If set to an integer the 
saveinfo 
logical. Store information about variable selection
procedure in 
update 
logical. If 
splitflavour 
use exhaustive search over splits instead of maximally
selected statistics ( 
The arguments teststat
, testtype
and mincriterion
determine how the global null hypothesis of independence between all input
variables and the response is tested (see ctree
).
The variable with most extreme pvalue or test statistic is selected
for splitting. If this isn't possible due to sample size constraints
explained in the next paragraph, up to splittry
other variables
are inspected for possible splits.
A split is established when all of the following criteria are met:
1) the sum of the weights in the current node
is larger than minsplit
, 2) a fraction of the sum of weights of more than
minprob
will be contained in all daughter nodes, 3) the sum of
the weights in all daughter nodes exceeds minbucket
, and 4)
the depth of the tree is smaller than maxdepth
.
This avoids pathological splits deep down the tree.
When stump = TRUE
, a tree with at most two terminal nodes is computed.
The argument mtry > 0
means that a random forest like 'variable
selection', i.e., a random selection of mtry
input variables, is
performed in each node.
In each inner node, maxsurrogate
surrogate splits are computed
(regardless of any missing values in the learning sample). Factors
in test samples whose levels were empty in the learning sample
are treated as missing when computing predictions (in contrast
to ctree
. Note also the different behaviour of
majority
in the two implementations.
A list.
B. E. T. H. Twala, M. C. Jones, and D. J. Hand (2008), Good Methods for Coping with Missing Data in Decision Trees, Pattern Recognition Letters, 29(7), 950–956.
Tal Galili, Isaac Meilijson (2016), Splitting Matters: How Monotone Transformation of Predictor Variables May Improve the Predictions of Decision Tree Models, https://arxiv.org/abs/1611.04561.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.