knitr::opts_chunk$set(echo = TRUE) library(ReSurv) library(reticulate) use_virtualenv("pyresurv")
This vignette shows how to tune the hyperparameters of the machine learning algorithms implemented in ReSurv
using the approach in @snoek12 implemented in the R package ParBayesianOptimization
(@ParBayesianOptimization).
For illustrative purposes, we will simulate daily claim notifications from one of the scenarios introduced in @hiabu23 (scenario Alpha).
input_data_0 <- data_generator( random_seed = 1964, scenario = 0, time_unit = 1 / 360, years = 4, period_exposure = 200 )
The counts data are then pre-processed using the IndividualDataPP
function.
individual_data <- IndividualDataPP( data = input_data_0, id = NULL, categorical_features = "claim_type", continuous_features = "AP", accident_period = "AP", calendar_period = "RP", input_time_granularity = "days", output_time_granularity = "quarters", years = 4 )
In the manuscript we fit the proportional likelihood of a cox model using extreme gradient boosting (XGB) and feed-forward neural networks (NN). The advantage of using the bayesian hyperparameters selection algorithm is in terms the computation time, and of a wide range of parameter choices (see again @snoek12). We show the ranges we inspected in the manuscript in the following table.
| Model | Hyperparameter | Range |
|-------|-----------------------------|-------------------------|
| NN | num_layers
| $\left[2,10\right]$ |
| | num_nodes
| $\left[2,10\right]$ |
| | optim
| $\left[1,2\right]$ |
| | activation
| $\left[1,2\right]$ |
| | lr
| $\left[.005,0.5\right]$ |
| | xi
| $\left[0,0.5\right]$ |
| | eps
| $\left[0,0.5\right]$ |
| XGB | eta
| $\left[0,1\right]$ |
| | max_depth
| $\left[0,25\right]$ |
| | min_child_weight
| $\left[0,50\right]$ |
| | lambda
| $\left[0,50\right]$ |
| | alpha
| $\left[0,50\right]$ |
In the following part of this vignette, we will discuss the steps required to optimize the hyperparameters with NN using the approach of @snoek12. We will provide the code we used for XGB as it follows a similar flow.
In ReSurv
, we have our own implementation of a standard K-Fold cross-validation (@hastie09), namely the ReSurvCV
method of an IndividualDataPP
object. We show an illustrative example for XGB below. Even if this routine can be used alone for choosing the optimal parameters, we suggest to use it within the bayesian approach of @snoek12. An example is provided in the following chunk for NN.
resurv_cv_xgboost <- ReSurvCV( IndividualDataPP = individual_data, model = "XGB", hparameters_grid = list( booster = "gbtree", eta = c(.001, .01, .2, .3), max_depth = c(3, 6, 8), subsample = c(1), alpha = c(0, .2, 1), lambda = c(0, .2, 1), min_child_weight = c(.5, 1) ), print_every_n = 1L, nrounds = 500, verbose = FALSE, verbose.cv = TRUE, early_stopping_rounds = 100, folds = 5, parallel = T, ncores = 2, random_seed = 1 )
The ReSurv
neural network implementation uses reticulate
to interface R Studio to Python and it is based on a similar approach to @katzman18, corrected to account for left-truncation and ties in the data. Similarly to the original implementation we relied on the Python library pytorch
(@pytorch). The syntax of our NN is then the syntax of pytorch
. See the reference guide for further information on the NN parametrization.
In order to use the ParBayesianOptimization
package, we first need to specify the NN parameter ranges we are interested into a vector. We discussed in the last section the ranges we choose for each algorithm.
bounds <- list( num_layers = c(2L, 10L), num_nodes = c(2L, 10L), optim = c(1L, 2L), activation = c(1L, 2L), lr = c(.005, 0.5), xi = c(0, 0.5), eps = c(0, 0.5) )
Secondly, we need to specify an objective function to be optimized with the Bayesian approach.
The ParBayesianOptimization
package can be loaded as
library(ParBayesianOptimization)
The score metric we inspect is the negative (partial) likelihood. The likelihood is returned with negative sign as @ParBayesianOptimization is maximizing the objective function.
obj_func <- function(num_layers, num_nodes, optim, activation, lr, xi, eps) { optim = switch(optim, "Adam", "SGD") activation = switch(activation, "LeakyReLU", "SELU") batch_size = as.integer(5000) number_layers = as.integer(num_layers) num_nodes = as.integer(num_nodes) deepsurv_cv <- ReSurvCV( IndividualDataPP = individual_data, model = "NN", hparameters_grid = list( num_layers = num_layers, num_nodes = num_nodes, optim = optim, activation = activation, lr = lr, xi = xi, eps = eps, tie = "Efron", batch_size = batch_size, early_stopping = 'TRUE', patience = 20 ), epochs = as.integer(300), num_workers = 0, verbose = FALSE, verbose.cv = TRUE, folds = 3, parallel = FALSE, random_seed = as.integer(Sys.time()) ) lst <- list( Score = -deepsurv_cv$out.cv.best.oos$test.lkh, train.lkh = deepsurv_cv$out.cv.best.oos$train.lkh ) return(lst) }
As a last step, we use the bayesOpt
function to perform the optimization.
bayes_out <- bayesOpt( FUN = obj_func, bounds = bounds, initPoints = 50, iters.n = 1000, iters.k = 50, otherHalting = list(timeLimit = 18000) )
To select the optimal hyperparameters we inspect bayes_out$scoreSummary
output. Below we print the first five rows of one of our runs. Observe scoreSummary
is a data.table
that contains some parameters specific of the original implementation (see @ParBayesianOptimization for more details)
| Epoch | Iteration | gpUtility | acqOptimum | inBounds | errorMessage | |-------|-----------|-----------|------------|----------|--------------| | 0 | 1 | | FALSE | TRUE | | | 0 | 2 | | FALSE | TRUE | | | 0 | 3 | | FALSE | TRUE | | | 0 | 4 | | FALSE | TRUE | | | 0 | 5 | | FALSE | TRUE | |
and the parameters we specified during the optimization:
| num_layers | num_nodes | optim | activation | lr | xi | eps | batch_size | Elapsed | Score | train.lkh | |-------------|------------|-------|------------|------|------|------|-------------|---------|-------|-----------| | 9 | 8 | 1 | 2 | 0.08 | 0.35 | 0.03 | 1226 | 6094.91 | -6.24 | 6.28 | | 9 | 2 | 2 | 1 | 0.47 | 0.50 | 0.10 | 3915 | 7307.31 | -7.27 | 7.30 | | 9 | 9 | 2 | 1 | 0.40 | 0.49 | 0.18 | 196 | 6719.70 | -5.98 | 5.97 | | 8 | 8 | 1 | 2 | 0.03 | 0.23 | 0.01 | 4508 | 8893.46 | -7.39 | 7.41 | | 9 | 7 | 2 | 1 | 0.13 | 0.13 | 0.12 | 900 | 3189.15 | -6.21 | 6.23 |
We select the final combination that minimizes the negative (partial) likelihood, in the Score
column.
In a similar fashion, we can optimize the gradient boosting parameters. We first set the hyperparameters grid.
bounds <- list( eta = c(0, 1), max_depth = c(1L, 25L), min_child_weight = c(0, 50), subsample = c(0.51, 1), lambda = c(0, 15), alpha = c(0, 15) )
Secondly, we define an objective function.
obj_func <- function(eta, max_depth, min_child_weight, subsample, lambda, alpha) { xgbcv <- ReSurvCV( IndividualDataPP = individual_data, model = "XGB", hparameters_grid = list( booster = "gbtree", eta = eta, max_depth = max_depth, subsample = subsample, alpha = lambda, lambda = alpha, min_child_weight = min_child_weight ), print_every_n = 1L, nrounds = 500, verbose = FALSE, verbose.cv = TRUE, early_stopping_rounds = 30, folds = 3, parallel = FALSE, random_seed = as.integer(Sys.time()) ) lst <- list( Score = -xgbcv$out.cv.best.oos$test.lkh, train.lkh = xgbcv$out.cv.best.oos$train.lkh ) return(lst) }
Lastly, we perform the optimization in a parallel setting using the DoParallel
package, as recommended in 'BayesOpt'.
library(DoParallel) cl <- makeCluster(parallel::detectCores()) registerDoParallel(cl) clusterEvalQ(cl, { library("ReSurv") }) bayes_out <- bayesOpt( FUN = obj_func , bounds = bounds , initPoints = length(bounds) + 20 , iters.n = 1000 , iters.k = 50 , otherHalting = list(timeLimit = 18000) , parallel = TRUE )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.