In jonascrevecoeur/hirem: Hierarchical reserving models

```{css, echo = FALSE} pre code, pre, code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; }

<style>
body {
text-align: justify}
</style>

```r
require(hirem) 
require(tidyr)
require(magrittr)
require(dplyr)
require(tidyverse)

Reserving data set included in hirem

The hirem package includes a data set, which is simualted based on a real insurance portfolio.

data('reserving_data')
head(reserving_data)

In this data set the development of a claim in a development year is recorded in three variables

close: One when the claim settles in the curent development year, zero otherwise
payment: One when there is a payment in the current development year, zero otherwise
size: Size of the payment in the current development year

The data set reserving_data is not censored. We artificially create the censoring that one would typically find in reserving data

upper_triangle <- reserving_data %>% filter(calendar_year <= 6)
lower_triangle <- reserving_data %>% filter(calendar_year > 6)

Building the hierarchical reserving model

The hirem package starts with an empty model and constructs the hierarchical model by sequentially adding new layers. Each layer represents an event recorded in the data set.

The function hirem start a new hierarchical reserving model. This function has a single argument data to which we pass the data on which we want to train the model.

model <- hirem(upper_triangle)

Two types of layers are currently implemented in the hirem package

layer_glm: Adds a layer estimated using generalized linear models
layer_gbm: Adds a layer estimated using gradient boosting models

We construct a hierarchical reserving model using layer_glm. See the documentation for a detailed description on the use of ?layer_gbm.

layer_glm requires 2 parameters:

name: The name of the layer. This name has to match the column name used of the covariate in the input data set.
family: The family object to pass to the glm routine.

The layer size is zero when there is no payment. To include this relation in the hierarchical reserving model, we add the optional parameter filter. filter is a function that returns a vector with values TRUE/FALSE. Records for which filter evaluates to FALSE are not included when fitting the layer and are set to zero in the simulation.

model <- hirem(upper_triangle) %>%
  layer_glm('close', binomial(link = cloglog)) %>%
  layer_glm('payment', binomial(link = logit)) %>%
  layer_glm('size', Gamma(link = log),
            filter = function(data){data$payment == 1})

Call fit to calibrate these generalized linear models. We pass to this function the formula describing the regresion model for each component.

model <- fit(model,
             close = 'close ~ factor(development_year) + factor(X1) + factor(X2)',
             payment = 'payment ~ close + factor(development_year) + factor(X1) + factor(X2)',
              size = 'size ~ close + factor(development_year) + factor(X1) + factor(X2)')

Simulate the future reserve

We have now defined and trained a hierarchical reserving model. This model defines the evolution of the events close, payment and size in a claims lifetime. Besides, these stochastic covariates, there are other covariates that have a deterministic evolution over time (e.g. development_year).

Before, we can simulate paths for the future development of claims, we register an updater to update these deterministic covariates.

update <- function(data) {
  data %>%
    dplyr::mutate(development_year = development_year + 1,
                  calendar_year = calendar_year + 1)
}

model <- register_updater(model, update)

See the documentation (?register_updater) for more information. By default the data is updated at the end of each cycle (development_year), but it is also possible to add updates after a specific layer. These updates can keep variables such as the total amount paid up to date during the simulation.

Call simulate to simulate future paths for reported claims. This function has 4 arguments:

obj: The hierarchical model from which we want to simulate
nsim: The number of simulations
filter: A function removing all claims for which we don't want to simualte the next development year.
data: The last observed status for the claims for which we want to simualte future paths

In our example, we simulate claims until they settle or until development year 6

simul <- simulate(model,
                  nsim = 5,
                  filter = function(data){dplyr::filter(data,
                                                       development_year <= 6,
                                                       close == 0)},
                  data = reserving_data %>% dplyr::filter(calendar_year == 6))

This simulation has the same structure as the input data

head(simul)

The extra columns simulation identifies the different simulations.

Comparing the predicted and actual reserve

rbns_estimate <- simul %>%
  dplyr::group_by(simulation) %>%
  dplyr::summarise(rbns = sum(size))

rbns_actual <- reserving_data %>%
  dplyr::filter(calendar_year > 6) %>%
  dplyr::summarise(rbns = sum(size))

rbns_estimate
rbns_actual

Predicting the lower half of the runoff triangle

lower_triangle_predicted <- simul %>%
  dplyr::group_by(reporting_year, development_year) %>%
  dplyr::summarise(total_size = sum(size) / max(simulation)) %>%
  dplyr::arrange(development_year) %>%
  tidyr::pivot_wider(values_from = total_size, names_from = development_year) %>%
  dplyr::arrange(reporting_year)

lower_triangle_actual <- reserving_data %>%
  dplyr::filter(calendar_year > 6) %>%
  dplyr::group_by(reporting_year, development_year) %>%
  dplyr::summarise(total_size = sum(size)) %>%
  dplyr::arrange(development_year) %>%
  tidyr::pivot_wider(values_from = total_size, names_from = development_year) %>%
  dplyr::arrange(reporting_year)

lower_triangle_actual
lower_triangle_predicted