fastcforest: Parallelized conditional inference random forest

View source: R/fastcforest.R

fastcforestR Documentation

Parallelized conditional inference random forest

Description

Parallelized version of cforest function from party package, which is an implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners.

Usage

fastcforest(formula, data = list(), subset = NULL, weights = NULL,
            controls = party::cforest_unbiased(),
            xtrafo = ptrafo, ytrafo = ptrafo, scores = NULL,
            parallel = TRUE)

Arguments

formula

a symbolic description of the model to be fit. Note that symbols like : and - will not work and the tree will make use of all variables listed on the rhs of formula

data

a data frame containing the variables in the model

subset

an optional vector specifying a subset of observations to be used in the fitting process

weights

an optional vector of weights to be used in the fitting process. Non-negative integer valued weights are allowed as well as non-negative real weights. Observations are sampled (with or without replacement) according to probabilities weights / sum(weights). The fraction of observations to be sampled (without replacement) is computed based on the sum of the weights if all weights are integer-valued and based on the number of weights greater zero else. Alternatively, weights can be a double matrix defining case weights for all ncol(weights) trees in the forest directly. This requires more storage but gives the user more control.

controls

an object of class ForestControl-class, which can be obtained using cforest_control (and its convenience interfaces cforest_unbiased and cforest_classical).

xtrafo

a function to be applied to all input variables. By default, the ptrafo function is applied.

ytrafo

a function to be applied to all response variables. By default, the ptrafo function is applied.

scores

an optional named list of scores to be attached to ordered factors

parallel

Logical indicating whether or not to run fastcforest in parallel using a backend provided by the foreach package. Default is TRUE.

Details

See cforest documentation for details. The code for parallelization is inspired by https://stackoverflow.com/questions/36272816/train-a-cforest-in-parallel

Value

An object of class RandomForest-class.

Author(s)

Nicolas Robette

References

Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5–32.

Torsten Hothorn, Berthold Lausen, Axel Benner and Martin Radespiel-Troeger (2004). Bagging Survival Trees. Statistics in Medicine, 23(1), 77–91.

Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette Molinaro and Mark J. van der Laan (2006a). Survival Ensembles. Biostatistics, 7(3), 355–373.

Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006b). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. Preprint available from https://www.zeileis.org/papers/Hothorn+Hornik+Zeileis-2006.pdf

Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007). Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 25. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25

Carolin Strobl, James Malley and Gerhard Tutz (2009). An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random forests. Psychological Methods, 14(4), 323–348.

See Also

cforest, fastvarImp

Examples

  ## classification
  data(iris)
  iris2 = iris
  iris2$Species = factor(iris$Species=="versicolor")
  iris.cf = fastcforest(Species~., data=iris2, parallel=FALSE)

moreparty documentation built on Nov. 22, 2023, 5:08 p.m.