The Gini index
Compute Gini indices and their standard errors.
a character that contains the name of the response variable.
a character vector that contains the list of the scores, which are the predictions from the set of models to be compared.
a character that contains the name of a baseline statistic. If
a data frame containing the variables listed in the above arguments.
For model comparison involving the compound Poisson distribution, the usual mean squared loss function is not quite informative for capturing the differences between predictions and observations, due to the high proportions of zeros and the skewed heavy-tailed distribution of the positive losses. For this reason, Frees et al. (2011) develop an ordered version of the Lorenz curve and the associated Gini index as a statistical measure of the association between distributions, through which different predictive models can be compared. The idea is that a score (model) with a greater Gini index produces a greater separation among the observations. In the insurance context, a higher Gini index indicates greater ability to distinguish good risks from bad risks. Therefore, the model with the highest Gini index is preferred.
This function computes the Gini indices and their asymptotic standard errors based on the ordered Lorenz curve. These metrics are mainly used for model comparison. Depending on the problem, there are generally two ways to do this. Take insurance predictive modeling as an example. First, when there is a baseline premium, we can compute the Gini index for each score (predictions from the model), and select the model with the highest Gini index. Second, when there is no baseline premium (
base = NULL), we successively specify the prediction from each model as the baseline premium and use the remaining models as the scores. This results in a matrix of Gini indices, and we select the model that is least vulnerable to alternative models using a "mini-max" argument - we select the score that provides the smallest of the maximal Gini indices, taken over competing scores.
gini returns an object of class
gini-class for details of the return values as well as various methods available for this class.
Yanwei (Wayne) Zhang firstname.lastname@example.org
Frees, E. W., Meyers, G. and Cummings, D. A. (2011). Summarizing Insurance Scores Using a Gini Index. Journal of the American Statistical Association, 495, 1085 - 1098.
The users are recommended to see the documentation for
gini-class for related information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## Not run: # Let's fit a series of models and compare them using the Gini index da <- subset(AutoClaim, IN_YY == 1) da <- transform(da, CLM_AMT = CLM_AMT / 1000) P1 <- cpglm(CLM_AMT ~ 1, data = da, offset = log(NPOLICY)) P2 <- cpglm(CLM_AMT ~ factor(CAR_USE) + factor(REVOLKED) + factor(GENDER) + factor(AREA) + factor(MARRIED) + factor(CAR_TYPE), data = da, offset = log(NPOLICY)) P3 <- cpglm(CLM_AMT ~ factor(CAR_USE) + factor(REVOLKED) + factor(GENDER) + factor(AREA) + factor(MARRIED) + factor(CAR_TYPE) + TRAVTIME + MVR_PTS + INCOME, data = da, offset = log(NPOLICY)) da <- transform(da, P1 = fitted(P1), P2 = fitted(P2), P3 = fitted(P3)) # compute the Gini indices gg <- gini(loss = "CLM_AMT", score = paste("P", 1:3, sep = ""), data = da) gg # plot the Lorenz curves theme_set(theme_bw()) plot(gg) plot(gg, overlay = FALSE) ## End(Not run)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.