{recforest}
offers a flexible solution for analyzing recurrent events
in survival data, outperforming traditional methods like the Cox model,
which struggle with repeated events (e.g., hospital readmissions) and
terminal events like death. By leveraging machine learning (Random
Survival Forests), RecForest models both the timing and frequency of
events, even with right-censored data, leading to more accurate
predictions and insights, ultimately aiding in better decision-making
and patient care.
The methodology is fully described in Murris, J., Bouaziz, O., Jakubczak, M., Katsahian, S., & Lavenu, A. (2024).
You can install the development version of {recforest}
like so:
TODO : TO BE COMPLETED ONCE THE PACKAGE IS ON GITHUB / CRAN
# Install from CRAN:
install.packages("recforest")
# Or the development version from GitHub:
# install.packages("pak")
pak::pak("XXXX/recforest")
A example dataset is provided with the package. It is a modified version
of the bladder1 dataset, studying bladder cancer recurrences, from the
{survival}
package, adapted to be usable with {recforest}
. Please
use ?survival::bladder
and ?bladder1_recforest
to get more
information about the dataset.
library(recforest)
data("bladder1_recforest")
head(bladder1_recforest)
#> # A tibble: 6 × 8
#> id t.start t.stop treatment number size death event
#> <int> <int> <int> <fct> <int> <int> <dbl> <dbl>
#> 1 1 0 0 placebo 1 1 0 0
#> 2 2 0 1 placebo 1 3 0 0
#> 3 3 0 4 placebo 2 1 0 0
#> 4 4 0 7 placebo 1 1 0 0
#> 5 5 0 10 placebo 5 1 0 0
#> 6 6 0 6 placebo 4 1 0 1
trained_forest <- train_forest(
data = bladder1_recforest,
id_var = "id",
covariates = c("treatment", "number", "size"),
time_vars = c("t.start", "t.stop"),
death_var = "death",
event = "event",
n_trees = 5,
n_bootstrap = 70,
mtry = 2,
minsplit = 3,
nodesize = 15,
method = "NAa",
min_score = 5,
max_nodes = 20,
seed = 111,
parallel = FALSE,
verbose = FALSE
)
A full explanation of the data-related and model-related parameters is provided in the vignette (see Further details).
The implementation of parallel computing in this package is based on the
{future}
and {future.apply}
packages. To enable parallel processing,
the parallel
parameter must be set to TRUE
. The number of cores to
use can be specified by adjusting the workers
parameter in the
future::plan()
function, which configures the parallelization
strategy. Two commonly used strategies are:
r
future::plan(future::multicore, workers = n_cores - 1)
r
future::plan(future::multisession, workers = n_cores - 1)
In both cases, the number of cores (n_cores
) should be defined by the
user based on the available resources.
The following can be run before training the model:
# Define the strategy and number of cores
n_cores <- min(future::availableCores(), n_trees)
future::plan(future::multisession, workers = n_cores - 1)
If you wish to use parallel computing, please refer to the {future}
package documentation for more
information.
print(trained_forest)
#>
#> ── Tree 1 ──
#>
#> ℹ Number of nodes : 5
#> ℹ c_index : 0.7490882567469
#> ℹ mse_imse : 316.52338769398
#> ℹ mse_iscore : -23.8459208072833
#>
#> ── Tree 2 ──
#>
#> ℹ Number of nodes : 7
#> ℹ c_index : 0.749320446994866
#> ℹ mse_imse : 317.020277729943
#> ℹ mse_iscore : -24.3379911330769
#>
#> ── Tree 3 ──
#>
#> ℹ Number of nodes : 9
#> ℹ c_index : 0.725611597704621
#> ℹ mse_imse : 552.713717581106
#> ℹ mse_iscore : -260.02620872387
#>
#> ── Tree 4 ──
#>
#> ℹ Number of nodes : 5
#> ℹ c_index : 0.757197981596913
#> ℹ mse_imse : 391.055121586804
#> ℹ mse_iscore : -98.20144371709
#>
#> ── Tree 5 ──
#>
#> ℹ Number of nodes : 7
#> ℹ c_index : 0.75103734439834
#> ℹ mse_imse : 419.451806839106
#> ℹ mse_iscore : -126.772535366445
summary(trained_forest)
#>
#> ── Data summary ────────────────────────────────────────────────────────────────
#> ℹ Number of individuals : 118
#> ℹ Number of predictors : 3
#>
#> ── Model parameters ────────────────────────────────────────────────────────────
#> ℹ mtry : 2
#> ℹ minsplit : 3
#> ℹ nodesize : 15
#> ℹ method : NAa
#> ℹ min_score : 5
#> ℹ max_nodes : 20
#>
#> ── Metrics ─────────────────────────────────────────────────────────────────────
#> ℹ c_index : 0.746451125488328
#> ℹ mse_imse : 399.352862286188
#> ℹ mse_iscore : -106.636819949553
#> ℹ computation time (seconds) : 4.4
The model can be used to predict an expected mean cumulative number of recurrent events per individual at the end of follow-up.
predictions <- predict(
trained_forest,
newdata = bladder1_recforest,
id_var = "id",
covariates = c("treatment", "number", "size"),
time_vars = c("t.start", "t.stop"),
death_var = "death"
)
A deeper explanation of the methodology and the features of the package can be found in the Vignettes.
The Vignettes are structured as follows:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.