rfTrain: Rapid Decision Tree Training
In Rborist: Extensible, Parallelizable Implementation of the Random Forest Algorithm

rfTrain

R Documentation

Rapid Decision Tree Training

Description

Accelerated training using the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Bindable with most numerical front-end languages in addtion to R.

Usage

## Default S3 method:
rfTrain(preFormat,
                 sampler,
                 y,
                autoCompress = 0.25,
                ctgCensus = "votes",
                classWeight = numeric(0),
                maxLeaf = 0,
                minInfo = 0.01,
                minNode = if (is.factor(y)) 2 else 3,
                nLevel = 0,
                nThread = 0,
                predFixed = 0,
                predProb = 0.0,
                predWeight = numeric(0),
                regMono = numeric(0),
                splitQuant = numeric(0),
                thinLeaves = FALSE,
                treeBlock = 1,
                verbose = FALSE,
                ...)

Arguments

`y`	the response (outcome) vector, either numerical or categorical.
`preFormat`	Compressed, presorted representation of the predictor values. Row count must conform with `y`.
`sampler`	Compressed representation of the sampled response.
`autoCompress`	plurality above which to compress predictor values.
`ctgCensus`	report categorical validation by vote or by probability.
`classWeight`	proportional weighting of classification categories.
`maxLeaf`	maximum number of leaves in a tree. Zero denotes no limit.
`minInfo`	information ratio with parent below which node does not split.
`minNode`	minimum number of distinct row references to split a node.
`nLevel`	maximum number of tree levels to train, including terminals (leaves). Zero denotes no limit.
`nThread`	suggests an `OpenMP`-style thread count. Zero denotes the default processor setting.
`predFixed`	number of trial predictors for a split (`mtry`).
`predProb`	probability of selecting individual predictor as trial splitter.
`predWeight`	relative weighting of individual predictors as trial splitters.
`regMono`	signed probability constraint for monotonic regression.
`splitQuant`	(sub)quantile at which to place cut point for numerical splits

`thinLeaves`	bypasses creation of leaf state in order to reduce memory footprint.
`treeBlock`	maximum number of trees to train during a single level (e.g., coprocessor computing).
`verbose`	indicates whether to output progress of training.
`...`	Not currently used.

Value

an object of class arbTrain, containing:

version the version of the Rborist package used to train.
samplerHash hash value of the Sampler object used to train. Recorded for consistency of subsequent commands.
predInfo a vector of forest-wide Gini (classification) or weighted variance (regression), by predictor.
forest an object of class Forest containing:
- nTree the number of trees trained.
- node an object of class Node consisting of:
  - treeNode forest-wide vector of packed node representations.
  - extent per-tree node counts.
  - scores numeric vector of scores, for all terminals and nonterminals.
  - factor an object of class Factor consisting of:
    - facSplit forest-wide vector of packed factor bits.
    - extent per-tree extent of factor bits.
    - observed forest-wide vector of observed factor bits.
- Leaf an object of class Leaf containing:
  - extent forest-wide vector of leaf populations, i.e., counts of unique samples.
  - index forest-wide vector of sample indices.
diag diagnostics accumulated over the training task.

Author(s)

Mark Seligman at Suiji.

Examples

## Not run: 
  # Regression example:
  nRow <- 5000
  x <- data.frame(replicate(6, rnorm(nRow)))
  y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.

  # Classification example:
  data(iris)

  # Generic invocation:
  rt <- rfTrain(y)


  # Causes 300 trees to be trained:
  rt <- rfTrain(y, nTree = 300)


  # Causes validation census to report class probabilities:
  rt <- rfTrain(iris[-5], iris[5], ctgCensus="prob")


  # Applies table-weighting to classification categories:
  rt <- rfTrain(iris[-5], iris[5], classWeight = "balance")


  # Weights first category twice as heavily as remaining two:
  rt <- rfTrain(iris[-5], iris[5], classWeight = c(2.0, 1.0, 1.0))


  # Does not split nodes when doing so yields less than a 2% gain in
  # information over the parent node:
  rt <- rfTrain(y, preFormat, sampler, minInfo=0.02)


  # Does not split nodes representing fewer than 10 unique samples:
  rt <- rfTrain(y, preFormat, sampler, minNode=10)


  # Trains a maximum of 20 levels:
  rt <- rfTrain(y, preFormat, sampler, nLevel = 20)


  # Trains, but does not perform subsequent validation:
  rt <- rfTrain(y, preFormat, sampler, noValidate=TRUE)


  # Chooses 500 rows (with replacement) to root each tree.
  rt <- rfTrain(y, preFormat, sampler, nSamp=500)


  # Chooses 2 predictors as splitting candidates at each node (or
  # fewer, when choices exhausted):
  rt <- rfTrain(y, preFormat, sampler, predFixed = 2)  


  # Causes each predictor to be selected as a splitting candidate with
  # distribution Bernoulli(0.3):
  rt <- rfTrain(y, preFormat, sampler, predProb = 0.3) 


  # Causes first three predictors to be selected as splitting candidates
  # twice as often as the other two:
  rt <- rfTrain(y, preFormat, sampler, predWeight=c(2.0, 2.0, 2.0, 1.0, 1.0))


  # Constrains modelled response to be increasing with respect to X1
  # and decreasing with respect to X5.
  rt <- rfTrain(x, y, preFormat, sampler, regMono=c(1.0, 0, 0, 0, -1.0, 0))


  # Suppresses creation of detailed leaf information needed for
  # quantile prediction and external tools.
  rt <- rfTrain(y, preFormat, sampler, thinLeaves = TRUE)

  spq <- rep(0.5, ncol(x))
  spq[0] <- 0.0
  spq[1] <- 1.0
  rt <- rfTrain(y, preFormat, sampler, splitQuant = spq)
  
## End(Not run)

Rborist documentation built on April 3, 2025, 8:04 p.m.

Rborist index

The Rborist package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Rborist
Extensible, Parallelizable Implementation of the Random Forest Algorithm

rfTrain: Rapid Decision Tree Training
In Rborist: Extensible, Parallelizable Implementation of the Random Forest Algorithm

Rapid Decision Tree Training

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to rfTrain in Rborist...

R Package Documentation

Browse R Packages

We want your feedback!

Rborist Extensible, Parallelizable Implementation of the Random Forest Algorithm

rfTrain: Rapid Decision Tree Training In Rborist: Extensible, Parallelizable Implementation of the Random Forest Algorithm

Rapid Decision Tree Training

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to rfTrain in Rborist...

R Package Documentation

Browse R Packages

We want your feedback!

Rborist
Extensible, Parallelizable Implementation of the Random Forest Algorithm

rfTrain: Rapid Decision Tree Training
In Rborist: Extensible, Parallelizable Implementation of the Random Forest Algorithm