Tips to speed up computation

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(aorsf)

Go faster

Analyses can slow to a crawl when models need hours to run. In this article you will find a few tricks to prevent this bottleneck when using orsf().

Use orsf_control_fast()

This is the default control value for orsf() and its run-time compared to other approaches can be striking. For example:

time_fast <- system.time(
 expr = orsf(pbc_orsf, 
             formula = time+status~. -id, 
             control = orsf_control_fast(), 
             n_tree = 5)
)

time_net <- system.time(
 expr = orsf(pbc_orsf, 
             formula = time+status~. -id, 
             control = orsf_control_net(), 
             n_tree = 5)
)

# control_fast() is much faster
time_net['elapsed'] / time_fast['elapsed']

Use n_thread

The n_thread argument uses multi-threading to run aorsf functions in parallel when possible. If you know how many threads you want, e.g. you want exactly 5, just say n_thread = 5. If you aren't sure how many threads you have available but want to use as many as you can, say n_thread = 0 and aorsf will figure out the number for you.

# automatically pick number of threads based on amount available

orsf(pbc_orsf, 
     formula = time+status~. -id, 
     n_tree = 5,
     n_thread = 0)

Because R is a single threaded language, multi-threading cannot be applied when orsf() needs to call R functions from C++, which occurs when a customized R function is used to find linear combination of variables or compute prediction accuracy.

Do less

There are some defaults in orsf() that can be adjusted to make it run faster:

Applying these tips:

orsf(pbc_orsf, 
     formula = time+status~., 
     na_action = 'na_impute_meanmode',
     n_thread = 0, 
     n_tree = 5, 
     n_retry = 0,
     oobag_pred_type = 'none', 
     importance = 'none',
     split_min_events = 20, 
     leaf_min_events = 10,
     split_min_stat = 10)

While these default values do make orsf() run slower, they also usually make its predictions more accurate or make the fit easier to interpret.

Show progress

Setting verbose_progress = TRUE doesn't make anything run faster, but it can help make it feel like things are running less slow.

verbose_fit <- orsf(pbc_orsf, 
                    formula = time+status~. -id, 
                    n_tree = 5, 
                    verbose_progress = TRUE)


Try the aorsf package in your browser

Any scripts or data that you put into this service are public.

aorsf documentation built on Oct. 26, 2023, 5:08 p.m.