rfTrain: Rapid Decision Tree Training

View source: R/rfTrain.R

rfTrainR Documentation

Rapid Decision Tree Training

Description

Accelerated training using the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Bindable with most numerical front-end languages in addtion to R.

Usage

## Default S3 method:
rfTrain(preFormat,
                 sampler,
                 y,
                autoCompress = 0.25,
                ctgCensus = "votes",
                classWeight = NULL,
                maxLeaf = 0,
                minInfo = 0.01,
                minNode = if (is.factor(y)) 2 else 3,
                nLevel = 0,
                nThread = 0,
                predFixed = 0,
                predProb = 0.0,
                predWeight = NULL, 
                regMono = NULL,
                splitQuant = NULL,
                thinLeaves = FALSE,
                treeBlock = 1,
                verbose = FALSE,
                ...)

Arguments

y

the response (outcome) vector, either numerical or categorical.

preFormat

Compressed, presorted representation of the predictor values. Row count must conform with y.

sampler

Compressed representation of the sampled response.

autoCompress

plurality above which to compress predictor values.

ctgCensus

report categorical validation by vote or by probability.

classWeight

proportional weighting of classification categories.

maxLeaf

maximum number of leaves in a tree. Zero denotes no limit.

minInfo

information ratio with parent below which node does not split.

minNode

minimum number of distinct row references to split a node.

nLevel

maximum number of tree levels to train. Zero denotes no limit.

nThread

suggests an OpenMP-style thread count. Zero denotes the default processor setting.

predFixed

number of trial predictors for a split (mtry).

predProb

probability of selecting individual predictor as trial splitter.

predWeight

relative weighting of individual predictors as trial splitters.

regMono

signed probability constraint for monotonic regression.

splitQuant

(sub)quantile at which to place cut point for numerical splits

.

thinLeaves

bypasses creation of leaf state in order to reduce memory footprint.

treeBlock

maximum number of trees to train during a single level (e.g., coprocessor computing).

verbose

indicates whether to output progress of training.

...

Not currently used.

Value

an object of class trainArb, containing:

version

The version of the Rborist package used to train.

samplerHash

Hash value of the Sampler object used to train. Recorded for consistency of subsequent commands.

predInfo

A vector of forest-wide Gini (classification) or weighted variance (regression), by predictor.

predMap

A vector of integers mapping internal to front-end predictor indices.

forest

an object of class Forest containing: nTreeThe number of trees trained. nodeAn object of class Node consisting of: treeNodeForest-wide vector of packed node representations. extentPer-tree node counts. scoresNumeric Vector of scores, per node. factorAn object of class Factor consisting of: facSplitForest-wide vector of packed factor bits. extentPer-tree extent of factor bits. observedForest-wide vector of observed factor bits.

Leafan object of class Leaf containint: extentforest-wide vector of leaf populations, i.e., counts of unique samples. indexforest-wide vector of sample indices.

diag

Diagnostics accumulated over the training task.

Author(s)

Mark Seligman at Suiji.

See Also

Rborist

Examples

## Not run: 
  # Regression example:
  nRow <- 5000
  x <- data.frame(replicate(6, rnorm(nRow)))
  y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.

  # Classification example:
  data(iris)

  # Generic invocation:
  rt <- rfTrain(y)


  # Causes 300 trees to be trained:
  rt <- rfTrain(y, nTree = 300)


  # Causes validation census to report class probabilities:
  rt <- rfTrain(iris[-5], iris[5], ctgCensus="prob")


  # Applies table-weighting to classification categories:
  rt <- rfTrain(iris[-5], iris[5], classWeight = "balance")


  # Weights first category twice as heavily as remaining two:
  rt <- rfTrain(iris[-5], iris[5], classWeight = c(2.0, 1.0, 1.0))


  # Does not split nodes when doing so yields less than a 2% gain in
  # information over the parent node:
  rt <- rfTrain(y, preFormat, sampler, minInfo=0.02)


  # Does not split nodes representing fewer than 10 unique samples:
  rt <- rfTrain(y, preFormat, sampler, minNode=10)


  # Trains a maximum of 20 levels:
  rt <- rfTrain(y, preFormat, sampler, nLevel = 20)


  # Trains, but does not perform subsequent validation:
  rt <- rfTrain(y, preFormat, sampler, noValidate=TRUE)


  # Chooses 500 rows (with replacement) to root each tree.
  rt <- rfTrain(y, preFormat, sampler, nSamp=500)


  # Chooses 2 predictors as splitting candidates at each node (or
  # fewer, when choices exhausted):
  rt <- rfTrain(y, preFormat, sampler, predFixed = 2)  


  # Causes each predictor to be selected as a splitting candidate with
  # distribution Bernoulli(0.3):
  rt <- rfTrain(y, preFormat, sampler, predProb = 0.3) 


  # Causes first three predictors to be selected as splitting candidates
  # twice as often as the other two:
  rt <- rfTrain(y, preFormat, sampler, predWeight=c(2.0, 2.0, 2.0, 1.0, 1.0))


  # Constrains modelled response to be increasing with respect to X1
  # and decreasing with respect to X5.
  rt <- rfTrain(x, y, preFormat, sampler, regMono=c(1.0, 0, 0, 0, -1.0, 0))


  # Suppresses creation of detailed leaf information needed for
  # quantile prediction and external tools.
  rt <- rfTrain(y, preFormat, sampler, thinLeaves = TRUE)

  spq <- rep(0.5, ncol(x))
  spq[0] <- 0.0
  spq[1] <- 1.0
  rt <- rfTrain(y, preFormat, sampler, splitQuant = spq)
  
## End(Not run)

Rborist documentation built on July 26, 2023, 5:32 p.m.