Introduction to aorsf

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.height = 5, 
  fig.width = 7
)

This article covers core features of the aorsf package.

Background: ORSF

The oblique random survival forest (ORSF) is an extension of the axis-based RSF algorithm.

Accelerated ORSF

The purpose of aorsf ('a' is short for accelerated) is to provide routines to fit ORSFs that will scale adequately to large data sets. The fastest algorithm available in the package is the accelerated ORSF model, which is the default method used by orsf():

library(aorsf)

set.seed(329)

orsf_fit <- orsf(data = pbc_orsf, 
                 n_tree = 5,
                 formula = Surv(time, status) ~ . - id)

orsf_fit

you may notice that the first input of aorsf is data. This is a design choice that makes it easier to use orsf with pipes (i.e., %>% or |>). For instance,

library(dplyr)

orsf_fit <- pbc_orsf |> 
 select(-id) |> 
 orsf(formula = Surv(time, status) ~ .,
      n_tree = 5)

Interpretation

aorsf includes several functions dedicated to interpretation of ORSFs, both through estimation of partial dependence and variable importance.

Variable importance

aorsf provides multiple ways to compute variable importance.

Partial dependence (PD)

r aorsf:::roxy_pd_explain()

For more on PD, see the vignette

Individual conditional expectations (ICE)

r aorsf:::roxy_ice_explain()

For more on ICE, see the vignette

What about the original ORSF?

The original ORSF (i.e., obliqueRSF) used glmnet to find linear combinations of inputs. aorsf allows users to implement this approach using the orsf_control_net() function:

orsf_net <- orsf(data = pbc_orsf, 
                 formula = Surv(time, status) ~ . - id, 
                 control = orsf_control_net())

net forests fit a lot faster than the original ORSF function in obliqueRSF. However, net forests are still much slower than cph ones.

aorsf and other machine learning software

The unique feature of aorsf is its fast algorithms to fit ORSF ensembles. RLT and obliqueRSF both fit oblique random survival forests, but aorsf does so faster. ranger and randomForestSRC fit survival forests, but neither package supports oblique splitting. obliqueRF fits oblique random forests for classification and regression, but not survival. PPforest fits oblique random forests for classification but not survival.

Note: The default prediction behavior for aorsf models is to produce predicted risk at a specific prediction horizon, which is not the default for ranger or randomForestSRC. I think this will change in the future, as computing time independent predictions with aorsf could be helpful.



Try the aorsf package in your browser

Any scripts or data that you put into this service are public.

aorsf documentation built on Oct. 26, 2023, 5:08 p.m.