knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The FCBMA package uses factor collapsing (FC) and Bayesian model averaging (BMA) to find the optimal manners of combinations of categorical levels within categorical variables (i.e. clustering of categorical levels) in linear or generalized linear regression models, as introduced in Hu et al (2018)

Installation

You can install the latest development version of FCBMA from GitHub: ``` {r eval = FALSE} install.packages("devtools") devtools::install_github("senhu/FCBMA")

Then the package can be loaded with:
``` {r eval = TRUE}
library(FCBMA)

Example

This vignette follows the example demonstrated in Hu et al (2018), using a Swedish third party motor insurance claims data from 1977 (Hallin and Ingenbleek, 1983) (available in this package). The data can be loaded via

data("sweden")

Details about the data set can be found in the data set documentation within the FCBMA package.

We start with the frequency aspect of claim modelling, building a baseline frequency model based on which the categorical variables (factors) will be collapsed: ``` {r eval = TRUE} freq <- glm(Claims ~ Zone + Bonus + Make + Kilometres, offset = log(Insured), data = sweden, family = "poisson")

summary(freq)

The model summary shows that all main effects are significant, but some levels within the rating factor Make are not statistically significant and hence clustering these levels seems natural in the next step. 
We first use multiple comparisons to compare the equivalence of means for the factor Make, employing the `multcomp` R package. 
``` {r eval = TRUE, message = FALSE, warning = FALSE}
freq.make.mult <- multcomp::glht(freq, linfct = multcomp::mcp(Make = "Tukey"))
# summary(freq.make.mult)

Then by comparison, the factor Make is collapsed individually via FCBMA, and the model summary shows the best 5 collapsed models: ``` {r eval = TRUE} freq.make <- FCBMA(model = freq, varia.list = c("Make"), method = "SA", verbose = FALSE) summary(freq.make)

Through `FCBMA` the grouping is more granular, and BMA takes care of the uncertainty surrounding the grouping of levels 2 and 5 within the Make factor.
Similarly we can also collapse other factors such as Kilometres individually via
``` {r eval = TRUE}
freq.kilo <- FCBMA(freq, varia.list = c("Kilometres"),
                   method = "complete", verbose = FALSE)
summary(freq.kilo)

Note that, depending on the size of the potential model space being searched for the optimal combinations of collapsing, either a greedy/complete search or a stochastic search with simulated annealing or a genetic algorithm can be used, by setting method = "complete", "SA" or "GA".

Next the severity model is considered, and the baseline model is fitted as ``` {r eval = TRUE} sev <- glm(perd ~ Zone + Make + Bonus + Kilometres, weights = Claims, data = sweden, family = Gamma(link="log"))

summary(sev)

anova(sev, test="Chisq")

The baseline model summary suggests that the factor Kilometres is not significant, although some levels within the factor are significant or marginally significant; the factor Make also has levels 2 and 6 not statistically significant. 
We start collapsing factors individually, via
``` {r eval = TRUE}
sev.kilo <- FCBMA(model = sev, varia.list = c("Kilometres"),
                  method = "complete", verbose = FALSE)
summary(sev.kilo)

sev.make <- FCBMA(model = sev, varia.list = c("Make"),
                  method = "GA", verbose = FALSE)
summary(sev.make)

If collapsing all factors simultaneously, there are 845,768,090,076 potential models in the model space to be searched, hence a stochastic search needs to be used. ``` {r eval = TRUE} sev.all <- FCBMA(sev, varia.list = c("Kilometres", "Zone", "Bonus", "Make"), method = "SA", verbose = FALSE) summary(sev.all)

Once all factors have been collapsed, new models can be fitted based on any of the best collapsing combinations, and predictions can be done based on the new models via, for example
``` {r eval = FALSE}
newmod <- fc.model.refit(varia.list = c("Kilometres", "Zone", "Bonus", "Make"),
                         merge.list = sev.all$best.state, mod = sev)
pred <- fc.predict(data=sweden, model = sev, 
                   varia.list = c("Kilometres", "Zone", "Bonus", "Make"),
                   merge.list = sev.all$best.state)

Reference



senhu/FCBMA documentation built on Aug. 6, 2019, 4:56 p.m.