AutoClaim: AutoClaim dataset

Description Usage Format Author(s) Source References Examples

Description

The motor insurance dataset is originially retrieved the cplm package. It contains insurance claim data as well as information on the policyholder. From the original dataset, only part of the variables are kept and some are transformed (see description below). Missing values are imputated via the rrcovNA::impSeq function.

Usage

1
data("AutoClaim")

Format

A data frame with 10296 observations on the following 35 variables.

CLM_AMT5

Aggregate claim loss of policy (in thousands)

KIDSDRIV

Number of child passengers

TRAVTIME

Commute time

CAR_USE

(1) Private or (2) commercial use

BLUEBOOK

(log) car value

NPOLICY

Number of policies

RED_CAR

Whether the color of the car is (2) car or (1) not

REVOLKED

Whether the policyholder's license was (2) revoked in the past or (1) not

MVR_PTS

Number of motor vehicule record points

HOMEKIDS

Number of children at home

GENDER

Gender of policyholder : (1) female or (2) male

MARRIED

Whether the policyholder is (2) married or (1) not

PARENT1

Whether (2) the policyholder grew up in a single-parent family or (1) not

AREA

(1) Rural or (2) urban area

CAR_TYPE_2-6

(0-1 dummy variables) Type of car : (base) Panel Truck, (2) Pickup,(3) Sedan, (4) Sports Car, (5) SUV, (6) Van

JOBCLASS_2-9

(0-1 dummy variables) Job class of policyholder: (base) Unknown, (2) Blue Collar, (3) Clerical, (4) Doctor, (5) Home Maker, (6) Lawyer, (7) Manager, (8) Professionnal, (9) Student

MAX_EDUC_2-5

(0-1 dummy variables) Maximal level of education of policyholder: (base) less than High School, (2) Bachelors, (3) High School, (4) Masters, (5) PhD

AGE_CAT_2-5

(0-1 dummy variables) Age category of policyholder : (base) <30, (2) [30,40), (3) [40,50), (4) [50,60), (5) 60+].

Author(s)

Simon Fontaine, Yi Yang, Bo Fan, Wei Qian and Yuwen Gu.

Maintainer: Simon Fontaine fontaines@dms.umontreal.ca

Source

cplm package.

References

Fontaine, S., Yang, Y., Fan, B., Qian, W. and Gu, Y. (2018). "A Unified Approach to Sparse Tweedie Model with Big Data Applications to Multi-Source Insurance Claim Data Analysis," to be submitted.

Zhang, Y. (2013). "cplm: Compound Poisson Linear Models." A vignette for R package cplm. Available from http://cran.r-project.org/web/packages/cplm.

Todorov, V. (2016). "rrcovNA: Scalable Robust Estimators with High Breakdown Point for Incomplete Data." A vignette for R package rrcovNA. Available from https://cran.r-project.org/web/packages/rrcovNA.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#import package
library(MSTweedie)

#load data
data(AutoClaim)

#display head of dataset
head(AutoClaim)

#classify the policies by REVOLKED and whether there was a claim or not
table(AutoClaim$REVOLKED, AutoClaim$CLM_AMT5 > 0)

fontaine618/MSTweedie documentation built on May 25, 2019, 5:22 p.m.