tune.rfsrc | R Documentation |
Finds the optimal mtry and nodesize tuning parameter for a random forest using out-of-sample error. Applies to all families.
## S3 method for class 'rfsrc'
tune(formula, data,
mtryStart = ncol(data) / 2,
nodesizeTry = c(1:9, seq(10, 100, by = 5)), ntreeTry = 100,
sampsize = function(x){min(x * .632, max(150, x ^ (3/4)))},
nsplit = 1, stepFactor = 1.25, improve = 1e-3, strikeout = 3, maxIter = 25,
trace = FALSE, doBest = TRUE, ...)
## S3 method for class 'rfsrc'
tune.nodesize(formula, data,
nodesizeTry = c(1:9, seq(10, 150, by = 5)), ntreeTry = 100,
sampsize = function(x){min(x * .632, max(150, x ^ (4/5)))},
nsplit = 1, trace = TRUE, ...)
formula |
A symbolic formula describing the model to be fit. |
data |
A data frame containing the response variable and predictor variables. |
mtryStart |
Initial value of |
nodesizeTry |
Vector of |
ntreeTry |
Number of trees used during the tuning step. |
sampsize |
Function specifying the size of the subsample. Can also be a numeric value. |
nsplit |
Number of random split points considered when splitting a node. |
stepFactor |
Multiplicative factor used to adjust |
improve |
Minimum relative improvement in out-of-sample error required to continue the search. |
strikeout |
Number of consecutive non-improving steps (negative improvement) allowed before stopping the search. Increase to allow a more exhaustive search. |
maxIter |
Maximum number of iterations allowed for the |
trace |
If |
doBest |
If |
... |
Additional arguments passed to |
tune
returns a matrix with three columns: the first and second columns contain the nodesize
and mtry
values evaluated during the tuning process, and the third column contains the corresponding out-of-sample error.
The error is standardized. For multivariate forests, it is averaged over the outcomes; for competing risks, it is averaged over the event types.
If doBest = TRUE
, the function also returns a forest object fit using the optimal mtry
and nodesize
values.
All tuning calculations, including the final optimized forest, are performed using the fast forest interface rfsrc.fast
, which relies on subsampling. This makes the procedure computationally efficient but approximate. Users seeking more accurate tuning results may wish to adjust parameters such as:
Increasing sampsize
, which controls the size of the subsample used for tuning.
Increasing ntreeTry
, which defaults to 100 for speed.
It is also helpful to visualize the out-of-sample error surface as a function of mtry
and nodesize
using a contour plot (see example below) to identify regions of low error.
The function tune.nodesize
performs a simplified search by optimizing only over nodesize
.
Hemant Ishwaran and Udaya B. Kogalur
rfsrc.fast
## ------------------------------------------------------------
## White wine classification example
## ------------------------------------------------------------
## load the data
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)
## set the sample size manually
o <- tune(quality ~ ., wine, sampsize = 100)
## here is the optimized forest
print(o$rf)
## visualize the nodesize/mtry OOB surface
if (library("interp", logical.return = TRUE)) {
## nice little wrapper for plotting results
plot.tune <- function(o, linear = TRUE) {
x <- o$results[,1]
y <- o$results[,2]
z <- o$results[,3]
so <- interp(x=x, y=y, z=z, linear = linear)
idx <- which.min(z)
x0 <- x[idx]
y0 <- y[idx]
filled.contour(x = so$x,
y = so$y,
z = so$z,
xlim = range(so$x, finite = TRUE) + c(-2, 2),
ylim = range(so$y, finite = TRUE) + c(-2, 2),
color.palette =
colorRampPalette(c("yellow", "red")),
xlab = "nodesize",
ylab = "mtry",
main = "error rate for nodesize and mtry",
key.title = title(main = "OOB error", cex.main = 1),
plot.axes = {axis(1);axis(2);points(x0,y0,pch="x",cex=1,font=2);
points(x,y,pch=16,cex=.25)})
}
## plot the surface
plot.tune(o)
}
## ------------------------------------------------------------
## tuning for class imbalanced data problem
## - see imbalanced function for details
## - use rfq and perf.type = "gmean"
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
o <- tune(status ~ ., data = breast, rfq = TRUE, perf.type = "gmean")
print(o)
## ------------------------------------------------------------
## tune nodesize for competing risk - wihs data
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
plot(tune.nodesize(Surv(time, status) ~ ., wihs, trace = TRUE)$err)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.