rfArb | R Documentation |
Accelerated implementation of the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Bindable with most numerical front-end languages in addtion to R. Invocation is similar to that provided by randomForest package.
## Default S3 method:
rfArb(x,
y,
autoCompress = 0.25,
ctgCensus = "votes",
classWeight = numeric(0),
discardState = FALSE,
impPermute = 0,
indexing = FALSE,
maxLeaf = 0,
minInfo = 0.01,
minNode = if (is.factor(y)) 2 else 3,
nHoldout = 0,
nLevel = 0,
nSamp = 0,
nThread = 0,
nTree = 500,
noValidate = FALSE,
predFixed = 0,
predProb = 0.0,
predWeight = numeric(0),
quantVec = numeric(0),
quantiles = length(quantVec) > 0,
regMono = numeric(0),
rowWeight = numeric(0),
samplingWeight = numeric(0),
splitQuant = numeric(0),
streamline = FALSE,
thinLeaves = streamline || (is.factor(y) && !indexing),
trapUnobserved = FALSE,
treeBlock = 1,
verbose = FALSE,
withRepl = TRUE,
...)
x |
the design matrix expressed as a |
y |
the response (outcome) vector, either numerical or
categorical. Row count must conform with |
autoCompress |
plurality above which to compress predictor values. |
ctgCensus |
report categorical validation by vote or by probability. |
classWeight |
proportional weighting of classification categories. |
discardState |
minimizes storage by discarding primary training output. Useful for parameter sweeps and cross-validation, in which only validation may be of interest. |
impPermute |
number of importance permutations: 0 or 1. |
indexing |
whether to report final index, typically terminal, of validation tree traversal. |
maxLeaf |
maximum number of leaves in a tree. Zero denotes no limit. |
minInfo |
information ratio with parent below which node does not split. |
minNode |
minimum number of distinct row references to split a node. |
nHoldout |
number of observations to omit from sampling. Augmented by missing response values. |
nLevel |
maximum number of tree levels to train, including terminals (leaves). Zero denotes no limit. |
nSamp |
number of rows to sample, per tree. |
nThread |
suggests an OpenMP-style thread count. Zero denotes the default processor setting. |
nTree |
the number of trees to train. |
noValidate |
whether to train without validation. |
predFixed |
number of trial predictors for a split ( |
predProb |
probability of selecting individual predictor as trial splitter. |
predWeight |
relative weighting of individual predictors as trial splitters. |
quantVec |
quantile levels to validate. |
quantiles |
whether to report quantiles at validation. |
regMono |
signed probability constraint for monotonic regression. |
rowWeight |
row weighting for initial sampling of tree. Deprecated |
samplingWeight |
row weighting for initial sampling of tree. |
splitQuant |
(sub)quantile at which to place cut point for numerical splits |
.
streamline |
whether to streamline sampler contents to save space. |
thinLeaves |
bypasses creation of leaf state in order to reduce storage footprint. |
trapUnobserved |
reports score for nonterminal upon encountering values not observed during training, such as missing data. |
treeBlock |
maximum number of trees to train during a single level (e.g., coprocessor computing). |
verbose |
indicates whether to output progress of training. |
withRepl |
whether row sampling is by replacement. |
... |
not currently used. |
an object sharing classes rfArb
, a supplementary collection
consisting of the following items:
sampler
an object of class Sampler
, as described in the
documentation for the presample
command, that summarizes the
bagging structure.
training
a list summarizing the training task, consisting of the following fields:
call
the calling invocation.
info
a vector of forest-wide Gini (classification) or weighted variance (regression), by predictor.
version
the version of the Rborist
package used to train.
diag
diagnostics accumulated over the training task.
samplerHash
hash value of the Sampler
object used to train. Recorded for consistency of subsequent commands.
prediction
an object of class PredictReg
or PredictCtg
, as described by the documention for command predict
.
validation
an object of class ValidReg
or ValidCtg
, as described by the documention for commandvalidate
, if validation is requested.
importance
an object of class ImportanceReg
orImportanceCtg
, as described by the documention for command predict
, if permutation performance has been requested.
Mark Seligman at Suiji.
Breiman, L. (2001) Random Forests, Machine Learning 45(1), 5-32.
Rborist
## Not run:
# Regression example:
nRow <- 5000
x <- data.frame(replicate(6, rnorm(nRow)))
y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.
# Classification example:
data(iris)
# Generic invocation:
rb <- rfArb(x, y)
# Causes 300 trees to be trained:
rb <- rfArb(x, y, nTree = 300)
# Causes rows to be sampled without replacement:
rb <- rfArb(x, y, withRepl=FALSE)
# Causes validation census to report class probabilities:
rb <- rfArb(iris[-5], iris[5], ctgCensus="prob")
# Applies table-weighting to classification categories:
rb <- rfArb(iris[-5], iris[5], classWeight = "balance")
# Weights first category twice as heavily as remaining two:
rb <- rfArb(iris[-5], iris[5], classWeight = c(2.0, 1.0, 1.0))
# Does not split nodes when doing so yields less than a 2% gain in
# information over the parent node:
rb <- rfArb(x, y, minInfo=0.02)
# Does not split nodes representing fewer than 10 unique samples:
rb <- rfArb(x, y, minNode=10)
# Trains a maximum of 20 levels:
rb <- rfArb(x, y, nLevel = 20)
# Trains, but does not perform subsequent validation:
rb <- rfArb(x, y, noValidate=TRUE)
# Chooses 500 rows (with replacement) to root each tree.
rb <- rfArb(x, y, nSamp=500)
# Chooses 2 predictors as splitting candidates at each node (or
# fewer, when choices exhausted):
rb <- rfArb(x, y, predFixed = 2)
# Causes each predictor to be selected as a splitting candidate with
# distribution Bernoulli(0.3):
rb <- rfArb(x, y, predProb = 0.3)
# Causes first three predictors to be selected as splitting candidates
# twice as often as the other two:
rb <- rfArb(x, y, predWeight=c(2.0, 2.0, 2.0, 1.0, 1.0))
# Causes (default) quantiles to be computed at validation:
rb <- rfArb(x, y, quantiles=TRUE)
qPred <- rb$validation$qPred
# Causes specfied quantiles (deciles) to be computed at validation:
rb <- rfArb(x, y, quantVec = seq(0.1, 1.0, by = 0.10))
qPred <- rb$validation$qPred
# Constrains modelled response to be increasing with respect to X1
# and decreasing with respect to X5.
rb <- rfArb(x, y, regMono=c(1.0, 0, 0, 0, -1.0, 0))
# Causes rows to be sampled with random weighting:
rb <- rfArb(x, y, samplingWeight=runif(nRow))
# Suppresses creation of detailed leaf information needed for
# quantile prediction and external tools.
rb <- rfArb(x, y, thinLeaves = TRUE)
# Directs prediction to take a random branch on encountering
# values not observed during training, such as NA or an
# unrecognized category.
predict(rb, trapUnobserved = FALSE)
# Directs prediction to silently trap unobserved values, reporting a
# score associated with the current nonterminal tree node.
predict(rb, trapUnobserved = TRUE)
# Sets splitting position for predictor 0 to far left and predictor
# 1 to far right, others to default (median) position.
spq <- rep(0.5, ncol(x))
spq[0] <- 0.0
spq[1] <- 1.0
rb <- rfArb(x, y, splitQuant = spq)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.