Control for Conditional Tree Forests
Description
Various parameters that control aspects of the ‘cforest’ fit via its ‘control’ argument.
Usage
1 2 3 4 5 6 7 8  cforest_unbiased(...)
cforest_classical(...)
cforest_control(teststat = "max",
testtype = "Teststatistic",
mincriterion = qnorm(0.9),
savesplitstats = FALSE,
ntree = 500, mtry = 5, replace = TRUE,
fraction = 0.632, trace = FALSE, ...)

Arguments
teststat 
a character specifying the type of the test statistic to be applied. 
testtype 
a character specifying how to compute the distribution of the test statistic. 
mincriterion 
the value of the test statistic (for 
mtry 
number of input variables randomly sampled as candidates
at each node for random forest like algorithms. Bagging, as special case
of a random forest without random input variable sampling, can
be performed by setting 
savesplitstats 
a logical determining whether the process of standardized twosample statistics for split point estimate is saved for each primary split. 
ntree 
number of trees to grow in a forest. 
replace 
a logical indicating whether sampling of observations is done with or without replacement. 
fraction 
fraction of number of observations to draw without
replacement (only relevant if 
trace 
a logical indicating if a progress bar shall be printed while the forest grows. 
... 
additional arguments to be passed to

Details
All three functions return an object of class ForestControlclass
defining hyper parameters to be specified via the control
argument
of cforest
.
The arguments teststat
, testtype
and mincriterion
determine how the global null hypothesis of independence between all input
variables and the response is tested (see ctree
). The
argument nresample
is the number of MonteCarlo replications to be
used when testtype = "MonteCarlo"
.
A split is established when the sum of the weights in both daugther nodes
is larger than minsplit
, this avoids pathological splits at the
borders. When stump = TRUE
, a tree with at most two terminal nodes
is computed.
The mtry
argument regulates a random selection of mtry
input
variables in each node. Note that here mtry
is fixed to the value 5 by
default for merely technical reasons, while in randomForest
the default values for classification and regression vary with the number of input
variables. Make sure that mtry
is defined properly before using cforest
.
It might be informative to look at scatterplots of input variables against
the standardized twosample split statistics, those are available when
savesplitstats = TRUE
. Each node is then associated with a vector
whose length is determined by the number of observations in the learning
sample and thus much more memory is required.
The number of trees ntree
can be increased for large numbers of input variables.
Function cforest_unbiased
returns the settings suggested
for the construction of unbiased random forests (teststat = "quad", testtype = "Univ",
replace = FALSE
) by Strobl et al. (2007)
and is the default since version 0.990.
Hyper parameter settings mimicing the behaviour of
randomForest
are available in
cforest_classical
which have been used as default up to
version 0.914.
Please note that cforest
, in contrast to
randomForest
, doesn't grow trees of
maximal depth. To grow large trees, set mincriterion = 0
.
Value
An object of class ForestControlclass
.
References
Carolin Strobl, AnneLaure Boulesteix, Achim Zeileis and Torsten Hothorn (2007). Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 25. http://www.BioMedCentral.com/14712105/8/25/
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.