In senhu/mvClaim: Multivariate general insurance claims modelling

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The mvClaim package provides a flexible modelling framework for a mixture of experts (MoE) model using bivariate gamma distributions and a range of parsimonious parameterizations via the EM algorithm, as introduced in Hu et al. (2019). It utilizes the bivariate gamma distribution proposed by Cheriyan (1941) and Ramabhadran (1951), which has not received much attention in the past. Depending on the parameterization of the framework, its interpretations, and whether one allows covriates to be incorporated in the mixing proportions and/or the bivariate gamma densities of the finite mixture models, a family of models are proposed, including:

Bivariate gamma distribution estimation: implemented via the function BGE in the mvClaim package.
Bivariate gamma regression: implemented via the function BGR, for "EE", "EI" and "IE" model types.
Mixture of bivariate gamma clustering: implemented via the function MBGC, for "*CC", "*CI" and "IC" model types.
Mixture of bivariate gamma regressions (i.e. mixture of bivariate gamma clusterings with covariates): implemented via the function MBGR, for "*VC", "*VI", "*VV", "*VE", "*CV", "*IV", "*EV", "*EC", and "*CE" model types.

Detailed explanations about the model names such as "EE", "*VC", "*VI", "*VV" can be found in the original paper of Hu et al (2019); the notation "*" represents the gating network, which can be either "E"(equal), "C"(constant) or "V"(variable); within the expert networks, "C" represents constant density parameters without covariates, "I" represents idential density parameters without covariates, "V" represents variable parameter values depending on covariates, and "E" represents equal parameters depending on covariates. This package vignette serves as supplementary material of Hu et al (2019).

The mvClaim package also provides another mixture model framework of a finite mixture of copula regressions with gamma GLMs as marginal distributions, for the same purpuse as discussed in Hu & O'Hagan (2019), including

Copula regression with gamma marginal distributions: implemented via the function copreg.gamma in the mvClaim package.
Mixture of copula regressions with gamma GLMs marginal distributions (i.e. mixture of copulas clustering with covariates): implemented via the function MCGR.

In this document we illustrate two examples of bivariate gamma mixture of experts (MoE) models with two artificually simulated data sets (named gatingsim and fullsim, available in this package) while illustrating the main relevant functions in the mvClaim package, closely following the simulation studies in Hu et al. (2019); the third example illustrates the functions for the (mixtures of) copula regressions with a simulated data set, following the simulation study in Hu and O'Hagan (2019).

Installation

You can install the latest development version of mvClaim from GitHub: ``` {r eval = FALSE} install.packages("devtools") devtools::install_github("senhu/mvClaim")

Then the package can be loaded with:
``` {r eval = TRUE}
library(mvClaim)

Example 1

The first example is demonstrated using a simulated data set called gatingsim as illustrated in Hu et al (2019), which is availabe in the mvClaim package. The data were simulated based on a gating network MoE, i.e. covariates were used only in the gating network to assist with identifying which component the observation was from. The detailed simulation process can be found in Hu et al (2019). The pairwise plot of the data is shown below. ``` {r echo = FALSE, fig.width=5, fig.height=4, fig.align="center"} data("gatingsim") my_cols <- c("#00AFBB", "#FC4E07") my_pch <- c(0,4) pairs(gatingsim[,1:5], col=my_cols[gatingsim$label+1], pch=my_pch[gatingsim$label+1], labels = expression("y"[1],"y"[2],"w"[1],"w"[2],"w"[3]), cex=.5)

If, assuming there is only one bivariate gamma distribution and no mixtures, and without considering covariates, the distribution parameters can be estimated via
``` {r eval = TRUE, message = FALSE, warning = FALSE}
est <- BGE(gatingsim[,1:2], verbose=FALSE)
summary(est)

If, assuming that the data are a mixture of bivariate gamma distributions and no covariates in the experts networks, there are three model types within this bivariate gamma MoE family: "*CC", "*CI" and "*IC". Suppose the expert network is of type "CC" and the gating network is of type "C" or "E", the mixture model can be fitted via: ``` {r eval = FALSE} m1 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 2, gating = "C", data = gatingsim) m2 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 2, gating = "E", data = gatingsim) m3 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 3, gating = "C", data = gatingsim) m4 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 3, gating = "E", data = gatingsim)

Fitting `G=1` in this case will be equivalent to using the `BGE` function. Model types can be set using `modelName = "CI"` or `"IC"`. Furthermore, covariates can enter the gating network with, for example `"gating = ~w1+w2+w3"`; For this example we know the true data generating process, which can be fitted via 
``` {r eval = TRUE, fig.width=5, fig.height=3.5, fig.align="center"}
m5 <- MBGC(modelName = "CC", y=c("y1","y2"), G = 2, gating = ~w1+w2+w3, data = gatingsim, verbose = FALSE)
summary(m5)
plot(m5, what = "classification", col=c("#00AFBB", "#FC4E07"), pch=c(0,4))

Example 2

The second example is demonstrated using another simulated data set called fullsim as illustrated in Hu et al (2019), and is availabe in the mvClaim package. The data were simulated based on a full MoE model, i.e. with covariates entering both the gating and experts networks. The detailed simulation process can be found in Hu et al (2019). The pairwise plot of the data is shown below. ``` {r eval = TRUE, echo = FALSE, fig.width=5, fig.height=4, fig.align="center"} data("fullsim") my_cols <- c("#00AFBB", "#FC4E07") my_pch <- c(0,4) pairs(fullsim[,1:5], # upper.panel = NULL, col=my_cols[fullsim$label+1], pch=my_pch[fullsim$label+1], cex=.5, labels = expression("Y"[1],"Y"[2],"w"[1],"w"[2],"w"[3]))

Now we consider only the cases when covariates are entering the expert networks.
In such cases, there are 9 different model types: `"*VC"`, `"*VI"`, `"*VV"`, `"*VE"`, `"*CV"`, `"*IV"`, `"*EV"`, `"*EC"`, `"*CE"`.
Typically the model selection issue needs to be addressed if the true model is unknown. We recommend a stepwise forward selection starting from a null model, i.e. fitting one bivariate gamma distribution without covariates. 
Then either add one extra component to the current optimal model, or change the model type, or add one covariate to any (combination) of the expert networks of the current model. If $G \geq 2$, covariates should also be added to the gating network. 
More details can be found in Hu et al (2019).  
This will lead to many models to be fitted. Here, since the true model is known, it can be fitted via
``` {r eval = TRUE, fig.width=5, fig.height=3.5, fig.align="center"}
m6 <- MBGR(modelName = "VV", y=c("y1","y2"),
           G=2, data = fullsim,
           f1     = ~ w1 + w2,
           f2     = ~ w2 + w3,
           f3     = ~ w1 + w2 + w3,
           f4     = ~ w1 + w2 + w3,
           gating = ~ w1 + w2 + w3, verbose = FALSE)
summary(m6)
plot(m6, what = "classification", col=c("#00AFBB", "#FC4E07"), pch=c(0,4))

There is also an accompanying fullsim.test data set available in this package that has been simulated similarly for test data prediction purposes. Based on the fitted model, predictions on the new data can be done via:

pred <- predict(m6, newdata=fullsim.test)

And the predictions are plotted in the figure below. ``` {r eval = TRUE, echo = FALSE, fig.width=5, fig.height=3.5, fig.align="center"} plot(fullsim.test$y1, fullsim.test$y2, xlab=expression("Y"[1]), ylab=expression("Y"[2]), pch=20, cex=.4, cex.lab=1) points(pred$fit, col="brown2", pch=15 )

## Example 3

Another example is briefly demonstrated here using a simulated data set called `simdat.mcgr` as illustrated in Hu and O'Hagan (2019). 
The data were simulated based on a mixture of two copula regressions, i.e. covariates were used to estimate the marginal distributions to assist the mixture of copulas estimation. Details of the sample data simulation process can be found in Hu and O'Hagan (2019).
``` {r}
data("simdat.mcgr")

Because we know the true data generating process for the data set simdat.mcgr, based on which the model can be fitted via ``` {r eval = FALSE} mod7 <- MCGR(copula = list(gumbelCopula(dim=2), frankCopula(dim=2)), f1 = y1 ~ x1+x2, f2 = y2 ~ x1+x2, G = 2, gating = "C" data=simdat.mcgr)

If `G=1`, as an alternative, copula regression with gamma GLMs as marginals (no mixtures) can be fitted via, for example:
``` {r eval = FALSE}
mod8 <- copreg.gamma(f1 = y1 ~ x1+x2,
                     f2 = y2 ~ x1+x2,
                     copula = gumbelCopula(dim=2),
                     data = simdat.mcgr)

Reference

Hu, S., Murphy, T. B. and O'Hagan, A. (2019) Bivariate Gamma Mixture of Experts Models for Joint Claims Modeling. To appear.
Hu, S. and O'Hagan, A. (2019) Copula Averaging for Tail Dependence in General Insurance Claims Data. To appear.
Cheriyan, K. (1941) A bivariate correlated gamma-type distribution function. Journal of the Indian Mathematical Society, 5, pp. 133-144.
Ramabhadran, V. (1951) A multivariate gamma-type distribution. Sankhya, 11, pp. 45-46.