knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The mvClaim
package provides a flexible modelling framework for a mixture of experts (MoE) model using bivariate gamma distributions and a range of parsimonious parameterizations via the EM algorithm, as introduced in Hu et al. (2019).
It utilizes the bivariate gamma distribution proposed by Cheriyan (1941) and Ramabhadran (1951), which has not received much attention in the past.
Depending on the parameterization of the framework, its interpretations, and whether one allows covriates to be incorporated in the mixing proportions and/or the bivariate gamma densities of the finite mixture models, a family of models are proposed, including:
BGE
in the mvClaim
package.BGR
, for "EE"
, "EI"
and "IE"
model types.MBGC
, for "*CC"
, "*CI"
and "IC"
model types.MBGR
, for "*VC"
, "*VI"
, "*VV"
, "*VE"
, "*CV"
, "*IV"
, "*EV"
, "*EC"
, and "*CE"
model types.Detailed explanations about the model names such as "EE"
, "*VC"
, "*VI"
, "*VV"
can be found in the original paper of Hu et al (2019); the notation "*"
represents the gating network, which can be either "E"
(equal), "C"
(constant) or "V"
(variable); within the expert networks, "C"
represents constant density parameters without covariates, "I"
represents idential density parameters without covariates, "V"
represents variable parameter values depending on covariates, and "E"
represents equal parameters depending on covariates. This package vignette serves as supplementary material of Hu et al (2019).
The mvClaim
package also provides another mixture model framework of a finite mixture of copula regressions with gamma GLMs as marginal distributions, for the same purpuse as discussed in Hu & O'Hagan (2019), including
copreg.gamma
in the mvClaim
package.MCGR
.In this document we illustrate two examples of bivariate gamma mixture of experts (MoE) models with two artificually simulated data sets (named gatingsim
and fullsim
, available in this package) while illustrating the main relevant functions in the mvClaim
package, closely following the simulation studies in Hu et al. (2019); the third example illustrates the functions for the (mixtures of) copula regressions with a simulated data set, following the simulation study in Hu and O'Hagan (2019).
You can install the latest development version of mvClaim
from GitHub:
``` {r eval = FALSE}
install.packages("devtools")
devtools::install_github("senhu/mvClaim")
Then the package can be loaded with: ``` {r eval = TRUE} library(mvClaim)
The first example is demonstrated using a simulated data set called gatingsim
as illustrated in Hu et al (2019), which is availabe in the mvClaim
package.
The data were simulated based on a gating network MoE, i.e. covariates were used only in the gating network to assist with identifying which component the observation was from.
The detailed simulation process can be found in Hu et al (2019).
The pairwise plot of the data is shown below.
``` {r echo = FALSE, fig.width=5, fig.height=4, fig.align="center"}
data("gatingsim")
my_cols <- c("#00AFBB", "#FC4E07")
my_pch <- c(0,4)
pairs(gatingsim[,1:5],
col=my_cols[gatingsim$label+1],
pch=my_pch[gatingsim$label+1],
labels = expression("y"[1],"y"[2],"w"[1],"w"[2],"w"[3]),
cex=.5)
If, assuming there is only one bivariate gamma distribution and no mixtures, and without considering covariates, the distribution parameters can be estimated via ``` {r eval = TRUE, message = FALSE, warning = FALSE} est <- BGE(gatingsim[,1:2], verbose=FALSE) summary(est)
If, assuming that the data are a mixture of bivariate gamma distributions and no covariates in the experts networks, there are three model types within this bivariate gamma MoE family: "*CC"
, "*CI"
and "*IC"
.
Suppose the expert network is of type "CC"
and the gating network is of type "C"
or "E"
, the mixture model can be fitted via:
``` {r eval = FALSE}
m1 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 2, gating = "C", data = gatingsim)
m2 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 2, gating = "E", data = gatingsim)
m3 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 3, gating = "C", data = gatingsim)
m4 <- MBGC(modelName = "CC", y = c("y1","y2"), G = 3, gating = "E", data = gatingsim)
Fitting `G=1` in this case will be equivalent to using the `BGE` function. Model types can be set using `modelName = "CI"` or `"IC"`. Furthermore, covariates can enter the gating network with, for example `"gating = ~w1+w2+w3"`; For this example we know the true data generating process, which can be fitted via ``` {r eval = TRUE, fig.width=5, fig.height=3.5, fig.align="center"} m5 <- MBGC(modelName = "CC", y=c("y1","y2"), G = 2, gating = ~w1+w2+w3, data = gatingsim, verbose = FALSE) summary(m5) plot(m5, what = "classification", col=c("#00AFBB", "#FC4E07"), pch=c(0,4))
The second example is demonstrated using another simulated data set called fullsim
as illustrated in Hu et al (2019), and is availabe in the mvClaim
package.
The data were simulated based on a full MoE model, i.e. with covariates entering both the gating and experts networks.
The detailed simulation process can be found in Hu et al (2019).
The pairwise plot of the data is shown below.
``` {r eval = TRUE, echo = FALSE, fig.width=5, fig.height=4, fig.align="center"}
data("fullsim")
my_cols <- c("#00AFBB", "#FC4E07")
my_pch <- c(0,4)
pairs(fullsim[,1:5], # upper.panel = NULL,
col=my_cols[fullsim$label+1],
pch=my_pch[fullsim$label+1],
cex=.5,
labels = expression("Y"[1],"Y"[2],"w"[1],"w"[2],"w"[3]))
Now we consider only the cases when covariates are entering the expert networks. In such cases, there are 9 different model types: `"*VC"`, `"*VI"`, `"*VV"`, `"*VE"`, `"*CV"`, `"*IV"`, `"*EV"`, `"*EC"`, `"*CE"`. Typically the model selection issue needs to be addressed if the true model is unknown. We recommend a stepwise forward selection starting from a null model, i.e. fitting one bivariate gamma distribution without covariates. Then either add one extra component to the current optimal model, or change the model type, or add one covariate to any (combination) of the expert networks of the current model. If $G \geq 2$, covariates should also be added to the gating network. More details can be found in Hu et al (2019). This will lead to many models to be fitted. Here, since the true model is known, it can be fitted via ``` {r eval = TRUE, fig.width=5, fig.height=3.5, fig.align="center"} m6 <- MBGR(modelName = "VV", y=c("y1","y2"), G=2, data = fullsim, f1 = ~ w1 + w2, f2 = ~ w2 + w3, f3 = ~ w1 + w2 + w3, f4 = ~ w1 + w2 + w3, gating = ~ w1 + w2 + w3, verbose = FALSE) summary(m6) plot(m6, what = "classification", col=c("#00AFBB", "#FC4E07"), pch=c(0,4))
There is also an accompanying fullsim.test
data set available in this package that has been simulated similarly for test data prediction purposes.
Based on the fitted model, predictions on the new data can be done via:
pred <- predict(m6, newdata=fullsim.test)
And the predictions are plotted in the figure below. ``` {r eval = TRUE, echo = FALSE, fig.width=5, fig.height=3.5, fig.align="center"} plot(fullsim.test$y1, fullsim.test$y2, xlab=expression("Y"[1]), ylab=expression("Y"[2]), pch=20, cex=.4, cex.lab=1) points(pred$fit, col="brown2", pch=15 )
## Example 3 Another example is briefly demonstrated here using a simulated data set called `simdat.mcgr` as illustrated in Hu and O'Hagan (2019). The data were simulated based on a mixture of two copula regressions, i.e. covariates were used to estimate the marginal distributions to assist the mixture of copulas estimation. Details of the sample data simulation process can be found in Hu and O'Hagan (2019). ``` {r} data("simdat.mcgr")
Because we know the true data generating process for the data set simdat.mcgr
, based on which the model can be fitted via
``` {r eval = FALSE}
mod7 <- MCGR(copula = list(gumbelCopula(dim=2), frankCopula(dim=2)),
f1 = y1 ~ x1+x2,
f2 = y2 ~ x1+x2,
G = 2,
gating = "C"
data=simdat.mcgr)
If `G=1`, as an alternative, copula regression with gamma GLMs as marginals (no mixtures) can be fitted via, for example: ``` {r eval = FALSE} mod8 <- copreg.gamma(f1 = y1 ~ x1+x2, f2 = y2 ~ x1+x2, copula = gumbelCopula(dim=2), data = simdat.mcgr)
Hu, S., Murphy, T. B. and O'Hagan, A. (2019) Bivariate Gamma Mixture of Experts Models for Joint Claims Modeling. To appear.
Hu, S. and O'Hagan, A. (2019) Copula Averaging for Tail Dependence in General Insurance Claims Data. To appear.
Cheriyan, K. (1941) A bivariate correlated gamma-type distribution function. Journal of the Indian Mathematical Society, 5, pp. 133-144.
Ramabhadran, V. (1951) A multivariate gamma-type distribution. Sankhya, 11, pp. 45-46.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.