knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(aorsf)
Analyses can slow to a crawl when models need hours to run. In this article you will find a few tricks to prevent this bottleneck when using orsf()
.
control
The default control
for orsf()
is NULL
because, if unspecified, orsf()
will pick the fastest possible control
for you depending on the type of forest being grown. The default control
run-time compared to other approaches can be striking. For example:
time_fast <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5) ) time_net <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 5) ) # unspecified control is much faster time_net['elapsed'] / time_fast['elapsed']
n_thread
The n_thread
argument uses multi-threading to run aorsf
functions in parallel when possible. If you know how many threads you want, e.g. you want exactly 5, set n_thread = 5
. If you aren't sure how many threads you have available but want to use a feasible amount, using n_thread = 0
(the default) tells aorsf
to do that for you.
# automatically pick number of threads based on amount available orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, n_thread = 0)
Note: sometimes multi-threading is not possible. For example, because R is a single threaded language, multi-threading cannot be applied when orsf()
needs to call R functions from C++, which occurs when a customized R function is used to find linear combination of variables or compute prediction accuracy.
There are some inputs in orsf()
that can be adjusted to make it run faster:
set n_retry
to 0
set oobag_pred_type
to 'none'
set importance
to 'none'
increase split_min_events
, split_min_obs
, leaf_min_events
, or leaf_min_obs
to make trees stop growing sooner
increase split_min_stat
to enforce more strict requirements for growing deeper trees.
Applying these tips:
orsf(pbc_orsf, formula = time+status~., n_thread = 0, n_tree = 5, n_retry = 0, oobag_pred_type = 'none', importance = 'none', split_min_events = 20, leaf_min_events = 10, split_min_stat = 10)
While modifying these inputs can make orsf()
run faster, they can also impact prediction accuracy.
Setting verbose_progress = TRUE
doesn't make anything run faster, but it can help make it feel like things are running less slow.
verbose_fit <- orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, verbose_progress = TRUE)
Instead of running a model and hoping it will be fast, you can estimate how long a specification of that model will take by using no_fit = TRUE
in the call to orsf()
.
fit_spec <- orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_survival(method = 'net'), n_tree = 2000, no_fit = TRUE) # how much time it takes to estimate training time: system.time( time_est <- orsf_time_to_train(fit_spec, n_tree_subset = 5) ) # the estimated training time: time_est
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.