auto: A motor insurance dataset

Description Usage Details Value References Examples

Description

The motor insurance dataset is originially retrieved from the SAS Enterprise Miner database. The included dataset is generated by re-organization and transformation as described in Qian et al. (2013).

Usage

1

Details

This data set contains 2812 policy samples with 56 predictors. See Qian et al. (2013) for a detailed description of the generation of these predictors. The response is the aggregate claim loss (in thousand dollars). The predictors are expanded from the following original variables:

CAR_TYPE:

car type, 6 categories

JOBCLASS:

job class, 8 categories

MAX_EDUC:

education level, 5 categories

KIDSDRIV:

number of children passengers

TRAVTIME:

time to travel from home to work

BLUEBOOK:

car value

NPOLICY:

number of policies

MVR_PTS:

motor vehicle record point

AGE:

driver age

HOMEKIDS:

number of children at home

YOJ:

years on job

INCOME:

income

HOME_VAL:

home value

SAMEHOME:

years in current address

CAR_USE:

whether the car is for commercial use

RED_CAR:

whether the car color is red

REVOLKED:

whether the driver's license was revoked in the past

GENDER:

gender

MARRIED:

whether married

PARENT1:

whether a single parent

AREA:

whether the driver lives in urban area

Value

A list with the following elements:

x

a [2812 x 56] matrix giving 2812 policy records with 56 predictors

y

the aggregate claim loss

References

Yip, K. C. H. and Yau, K. K. W. (2005), “On Modeling Claim Frequency Data In General Insurance With Extra Zeros”, Insurance: Mathematics and Economics 36, 153-163.

Zhang, Y (2013). “cplm: Compound Poisson Linear Models”. A vignette for R package cplm. Available from http://cran.r-project.org/web/packages/cplm

Qian, W., Yang, Y., Yang, Y. and Zou, H. (2013), “Tweedie's Compound Poisson Model With Grouped Elastic Net,” submitted to Journal of Computational and Graphical Statistics.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# load HDtweedie library
library(HDtweedie)

# load data set
data(auto)

# how many samples and how many predictors ?
dim(auto$x)

# repsonse y
auto$y

emeryyi/hdtweedie documentation built on May 16, 2019, 5:06 a.m.