knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(aorsf)
Analyses can slow to a crawl when models need hours to run. In this article you will find a few tricks to prevent this bottleneck when using orsf()
.
orsf_control_fast()
This is the default control
value for orsf()
and its run-time compared to other approaches can be striking. For example:
time_fast <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_fast(), n_tree = 5) ) time_net <- system.time( expr = orsf(pbc_orsf, formula = time+status~. -id, control = orsf_control_net(), n_tree = 5) ) # control_fast() is much faster time_net['elapsed'] / time_fast['elapsed']
n_thread
The n_thread
argument uses multi-threading to run aorsf
functions in parallel when possible. If you know how many threads you want, e.g. you want exactly 5, just say n_thread = 5
. If you aren't sure how many threads you have available but want to use as many as you can, say n_thread = 0
and aorsf
will figure out the number for you.
# automatically pick number of threads based on amount available orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, n_thread = 0)
Because R is a single threaded language, multi-threading cannot be applied when orsf()
needs to call R functions from C++, which occurs when a customized R function is used to find linear combination of variables or compute prediction accuracy.
There are some defaults in orsf()
that can be adjusted to make it run faster:
set n_retry
to 0 instead of 3 (the default)
set oobag_pred_type
to 'none' instead of 'surv' (the default)
set 'importance' to 'none' instead of 'anova' (the default)
increase split_min_events
, split_min_obs
, leaf_min_events
, or leaf_min_obs
to make trees stop growing sooner
increase split_min_stat
to make trees stop growing sooner
Applying these tips:
orsf(pbc_orsf, formula = time+status~., na_action = 'na_impute_meanmode', n_thread = 0, n_tree = 5, n_retry = 0, oobag_pred_type = 'none', importance = 'none', split_min_events = 20, leaf_min_events = 10, split_min_stat = 10)
While these default values do make orsf()
run slower, they also usually make its predictions more accurate or make the fit easier to interpret.
Setting verbose_progress = TRUE
doesn't make anything run faster, but it can help make it feel like things are running less slow.
verbose_fit <- orsf(pbc_orsf, formula = time+status~. -id, n_tree = 5, verbose_progress = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.