knitr::opts_chunk$set(echo = TRUE, cache = TRUE) install.packages("xts") install.packages("sp") install.packages("CASdatasets", repos = "http://cas.uqam.ca/pub/", type="source") library(LRMoE) library(reshape2) library(matrixStats) library(CASdatasets) glm_fit = readRDS("realdata_glm_model.rda") df = readRDS("realdata_converted_data.rda") init = readRDS("realdata_init_3comp.rda") LRMoE_fit = readRDS("realdata_LRMoE_3Poisson.rda") # data("LRMoEDemoData") # fitted_model = readRDS("fitted_model_true.rda") # fitted_model_new = readRDS("fitted_model_new-raw.rda")
In this document, we will analyze a real dataset of claim frequency with LRMoE. We will compare how it performs compared with the classical Poisson generalized linear model (GLM).
We will use the ausprivauto0405
dataset available in the R package CASdatasets
.
library(CASdatasets) data(ausprivauto0405)
The dataset contains some typical covariates used in predicting auto-insurance claim frequency and severity. For simplicity, we will only consider the problem of modelling claim frequency. A brief summary of the dataset is given below.
summary(ausprivauto0405)
As a benchmark, we will first fit a Poisson GLM (with log
link) to claim frequency. Note that Exposure
represents how long the policy has been in force / observed, which is used as an offset
term in model fitting.
The usual model summary is also given below.
glm_fit = glm(ClaimNb ~ Gender + DrivAge + VehBody + VehAge + VehValue, offset = Exposure, family = poisson(link = "log"), data = ausprivauto0405)
summary(glm_fit)
In order to fit an LRMoE, we first need to transform the dataset. Since DriverAge
, VehBody
and
VehAge
are factors, we will use model.matrix
to augment a dummy matrix of zeros and ones to represent them.
# 1: Intercept df = data.frame(Intercept = rep(1, nrow(ausprivauto0405))) # 2: Gender df = cbind(df, GenderMale = model.matrix(~Gender, data = ausprivauto0405)[,2]) # 3-7: DrivAge df = cbind(df, model.matrix(~DrivAge, data = ausprivauto0405)[,2:6]) # 8-19: VehBody df = cbind(df, model.matrix(~VehBody, data = ausprivauto0405)[,2:13]) # 20-22: VehAge df = cbind(df, VehAge = model.matrix(~VehAge, data = ausprivauto0405)[,2:4]) # 23: VehValue df = cbind(df, VehValue = ausprivauto0405$VehValue) # 24: Exposure df = cbind(df, Exposure = ausprivauto0405$Exposure) # 25: ClaimNb df = cbind(df, ClaimNb = ausprivauto0405$ClaimNb) df = as.matrix(df)
When fitting an LRMoE model, a good starting point usually reduces run time and can potentially land on a better fit. In the LRMoE package, we provide a function for parameter initialization based on the Clustered Method-of-Moments (CMM).
The following demonstrates how to initialize a 3-component LRMoE model.
set.seed(2021) X = as.matrix(df[,1:23]) exposure = as.matrix(df[,24]) Y = as.matrix(df[,25])
init = LRMoE::cmm_init(Y = Y, X = X, n_comp = 3, type = c("discrete"), exact_Y = TRUE)
The init
object contains a list of summary statistics of Y
, as well as a set of parameter
initialization. Let us consider a 3-component Poisson mixture.
# Initialization of alpha alpha_init = init$alpha_init alpha_init
# Initialization of component distributions # The list indices are specified as: # [[1]]: the 1st dimension of the response of Y # [[1]]/[[2]]/[[3]]: the 1st, 2nd and 3rd latent component/group # [[5]]: the 5th ponential choice of expert is zipoisson comp_dist = matrix(list(init$params_init[[1]][[1]][[1]]$distribution, init$params_init[[1]][[2]][[1]]$distribution, init$params_init[[1]][[3]][[1]]$distribution), nrow = 1, byrow = TRUE) params_init = matrix(list(init$params_init[[1]][[1]][[1]]$params, init$params_init[[1]][[2]][[1]]$params, init$params_init[[1]][[3]][[1]]$params), nrow = 1, byrow = TRUE)
If interested, we can also check some summary statistics of Y
based on the initialization.
For example, the proportion of zeros and the mean of positive Y
's are given below. These summary statistics may help with selecting the appropriate expert functions for each latent component/group.
init$zero_y
init$mean_y_pos
With an initialization of model, we can now fit the LRMoE with zero-inflated Poisson experts.
Note that exposure
is also taken into account. In the case of Poisson distribution with parameter $\lambda$, a policy with exposure $E_i$ would have claim frequency distribution $Poisson(\lambda E_i)$. The same rate parameter $\lambda$ is shared by all policyholders who have potentially different policy exposures.
(Note: The fitting function takes more than 30 minutes to run. Adjusting parameters such as a larger eps
or smaller ecm_iter_max
will provide a quicker, but potentially worse, model fit.)
LRMoE_fit = LRMoE::FitLRMoE(Y = Y, X = X, alpha_init = alpha_init, comp_dist = comp_dist, params_list = params_init, exposure = exposure, exact_Y = TRUE, eps = 0.05, ecm_iter_max = 25)
We can examine the fitted model as follows, as well as its loglikelihood, AIC and BIC. Note that, although we are fitting the same dataset, the result above is different from that presented in the Actuary Magazine. This is most likely due to a different initialization as well as stopping conditions. In practice, such differences are unlikely to have a material impact, as long as the overall goodness-of-fit is similar for these different models.
LRMoE_fit$alpha_fit
LRMoE::print_expert_matrix(LRMoE_fit$model_fit)
LRMoE_fit$ll
LRMoE_fit$AIC
LRMoE_fit$BIC
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.