| tune.rfsrc | R Documentation |
mtry and nodesizeFinds the optimal mtry and nodesize for a random forest
using out-of-bag (OOB) error. Two search strategies are supported: a
grid-based search and a golden-section search with noise control. Works
for all response families supported by rfsrc.fast.
## S3 method for class 'rfsrc'
tune(formula, data,
mtry.start = ncol(data) / 2,
nodesize.try = c(1:9, seq(10, 100, by = 5)), ntree.try = 100,
sampsize = function(x) { min(x * .632, max(150, x^(3/4))) },
nsplit = 1, step.factor = 1.25, improve = 1e-3, strikeout = 3, max.iter = 25,
method = c("grid", "golden"),
final.window = 5, reps.initial = 2, reps.final = 3,
trace = FALSE, do.best = TRUE, seed = NULL, ...)
## S3 method for class 'rfsrc'
tune.nodesize(formula, data,
nodesize.try = c(1:9, seq(10, 150, by = 5)), ntree.try = 100,
sampsize = function(x) { min(x * .632, max(150, x^(4/5))) },
nsplit = 1, method = c("grid", "golden"),
final.window = 5, reps.initial = 2, reps.final = 3, max.iter = 50,
trace = TRUE, seed = NULL, ...)
formula |
A model formula. |
data |
A data frame with response and predictors. |
mtry.start |
Initial |
nodesize.try |
Candidate |
ntree.try |
Number of trees grown at each tuning evaluation. |
sampsize |
Function or numeric giving the per-tree subsample size. During tuning a single numeric size |
nsplit |
Number of random split points to consider at each node. |
step.factor |
Multiplicative step-out factor over |
improve |
Minimum relative improvement required to continue a search step in |
strikeout |
Maximum number of consecutive non-improving steps allowed in |
max.iter |
Maximum number of iterations for the step-out search in |
method |
Search strategy: |
final.window |
For golden search, the terminal bracket width for the one-dimensional line search. |
reps.initial |
Replicates averaged at interior evaluations during golden iterations. |
reps.final |
Replicates averaged for each candidate during the final local sweep in golden search. |
trace |
If |
do.best |
If |
seed |
Optional integer for reproducible tuning. The holdout split (when used) and all tuning fits become deterministic for a given seed. |
... |
Additional arguments passed to |
Error estimate. If 2 * ssize < n, a disjoint holdout of
size ssize is used for evaluation; otherwise OOB error is
used.
Subsample used during tuning. Both functions derive a single
integer ssize from sampsize and pass it to
rfsrc.fast for all tuning fits. This improves stability
and comparability across candidates. When do.best = TRUE in
tune, the final forest is fit with the user-supplied
sampsize exactly as provided.
Grid search. tune performs a step-out search over
mtry for each nodesize in nodesize.try, using
step.factor, improve, strikeout, and
max.iter. tune.nodesize evaluates the supplied
nodesize.try grid directly.
Golden search. Uses a guarded golden-section line search with
noise control. For each one-dimensional search (over nodesize or
mtry), the routine probes a small left-anchor grid 1:9,
iterates golden shrinkage until the bracket width is at most
final.window, then runs a short local sweep with
reps.final replicates. In tune the searches over
nodesize and mtry alternate in a simple coordinate loop,
with improve and strikeout as stopping controls.
For tune:
results: matrix with columns nodesize, mtry, err.
optimal: named numeric vector c(nodesize = ..., mtry = ...).
rf: fitted forest at the optimum if do.best = TRUE.
For tune.nodesize:
nsize.opt: optimal nodesize.
err: data frame with columns nodesize and err.
Hemant Ishwaran and Udaya B. Kogalur
rfsrc.fast
## ------------------------------------------------------------
## White wine classification example
## ------------------------------------------------------------
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)
## Fixed seed makes tuning reproducible
set.seed(1)
## Full tuner over nodesize and mtry (grid)
o1 <- tune(quality ~ ., wine, sampsize = 100, method = "grid")
print(o1$optimal)
## Golden search alternative
o2 <- tune(quality ~ ., wine, sampsize = 100, method = "golden",
reps.initial = 2, reps.final = 3, seed = 1)
print(o2$optimal)
## visualize the nodesize/mtry surface
if (library("interp", logical.return = TRUE)) {
plot.tune <- function(o, linear = TRUE) {
x <- o$results[, 1]
y <- o$results[, 2]
z <- o$results[, 3]
so <- interp(x = x, y = y, z = z, linear = linear)
idx <- which.min(z)
x0 <- x[idx]; y0 <- y[idx]
filled.contour(x = so$x, y = so$y, z = so$z,
xlim = range(so$x, finite = TRUE) + c(-2, 2),
ylim = range(so$y, finite = TRUE) + c(-2, 2),
color.palette = colorRampPalette(c("yellow", "red")),
xlab = "nodesize", ylab = "mtry",
main = "error rate for nodesize and mtry",
key.title = title(main = "OOB error", cex.main = 1),
plot.axes = {
axis(1); axis(2)
points(x0, y0, pch = "x", cex = 1, font = 2)
points(x, y, pch = 16, cex = .25)
})
}
plot.tune(o1)
plot.tune(o2)
}
## ------------------------------------------------------------
## nodesize only: grid vs golden
## ------------------------------------------------------------
o3 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "grid",
trace = TRUE, seed = 1)
o4 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "golden",
reps.initial = 2, reps.final = 3, trace = TRUE, seed = 1)
plot(o3$err, type = "s", xlab = "nodesize", ylab = "error")
## ------------------------------------------------------------
## Tuning for class imbalance (rfq with geometric mean performance)
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
o5 <- tune(status ~ ., data = breast, rfq = TRUE, perf.type = "gmean",
method = "golden", seed = 1)
print(o5$optimal)
## ------------------------------------------------------------
## Competing risks example (nodesize only)
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
plot(tune.nodesize(Surv(time, status) ~ ., wihs, trace = TRUE)$err, type = "s")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.