library(lossdb)
library(knitr)
knitr::opts_chunk$set(
  comment = "#>",
  tidy = FALSE)

Introduction to lossdb

Warning: This vignette and package are still in the early stages of development...

Background: Actuaries and Insurance Loss Data

Actuaries often work with data sets describing the dollar value and nature of insurance losses (loss data). It is common for actuaries to analyze loss data on a "by id" basis (i.e. each row/observation represents a claim/occurrence/member at a certain development stage) or on a "by origin" basis (i.e. each row/observation represents a policy/accident period at a certain development stage). If data is not provided by id or by origin it is transformed into one of the two formats for analysis.

Motivation

I want to make reproducible actuarial reports in R. Getting raw data ready for analysis is a pain, and once the data is ready it is nice to have it in some kind of a standard format so you can do similar things to it whenever you get it into that format... enter lossdb

The goal of the lossdb package is to provide standard functions for manipulating, visualizing, and modeling loss data. lossdb provides a standard means of storing your loss data on a by id or by origin basis so you can use the same set of functions (defined by lossdb) to automate many repetitive tasks and ultimately create reproducible reports that are generated directly from the original loss data.

Philosophy

Columns in loss data can be losely grouped into a few different categories. By organizing each column into one of these categories (defined below) certain actuarial analysis tasks can be automated across many loss data sets.

The Structure of Loss Data

lossdb groups all the loss data columns applicable to the analysis into one of 2 main categories. These two overarching categories are further subdivided into several more categories. A description of the lossdb stucture is detailed below:

More detail on the lossdb structure is described below:

meta

The 3 meta columns are of particular importance. They are defined as follows:

dollar

All columns representing dollar amounts are "dollar" columns. lossdb assumes that all the information contained in dollar columns can be sub categorized into 4 different groups without losing any information. Each of these 4 groups can contain as many columns as neccessary. All dollar groups are optional, but you need to provide at least one column for one of the groups. The columns must be numeric and you can have as many columns in each group as desired. The 4 groups:

desc

Other columns included in loss data provide some type of description of the claim (e.g. claimant name, whether the claim is open or closed, etc.). I refer to all of these columns as "desc" (short for description) columns. The lossdb package can handle any number of description columns, they are optional, and they can be of any type.

Getting Started - An Example

The example will proceed as follows:

  1. Take a data set (occurrences) and transform that data set into a loss_df object that contains and organizes all the information relevant to the analysis.
  2. Review the loss_df for errors and potential problem areas.
  3. Perform statistical reserving techniques on the loss_df using the ChainLadder package.

View the structure of the occurrences data frame using the str function:

str(occurrences)

Now we can create the loss_df object.

# create loss_df object
mydf <- loss_df(occurrences, 
          id = "claim_number",
          origin = "origin",
          dev = "dev", 
          paid = c("paid_loss_only", "paid_expense"),
          incurred = c("incurred_loss_only", "incurred_expense"),
          paid_recovery = c("paid_excess250", "sal_sub"),
          incurred_recovery = c("incurred_excess250", 
                                "sal_sub_incurred"),
          desc = "claim_cts"
        )
kable(head(mydf[, 1:6]))

Each detail (dollar or desc) column has an attribute specifying the type of loss detail that the column contains. This attribute is named the "detail" attribute. The detail attribute of each column is defined by the argument the column is supplied to in the loss_df() function (i.e. paid_loss and paid_expense have a detail attribute of "paid"). All detail columns maintain the column names that they are supplied with. The names for meta columns are changed to the meta category to which they were supplied.

Review the data

Now we can use the lossdb package to review the data. Let's start by seeing a summary of the most recent calendar period (calendar = origin + dev) summarized by origin period.

kable(summary(mydf)[, 1:9])

We can look at the data at an older calendar period by specifying the calendar argument in the summary() function.

kable(summary(mydf, calendar = "2012")[, 1:9])

Note: the calendar period is the origin period plus the dev. (e.g. The calendar for all claims in origin year 2010 at their first calendar period would be 2011.)

and the built in bar chart representation of the data...

plot(mydf)

and plotted at an alternative calendar

plot(mydf, calendar = "2012")

We can return a data frame of all the claims that have experienced a change from one calendar to another by using the claim_changes() function:

# specify the loss amount values you want to see the changed claims for 
mychanges <- claim_changes(mydf, 
               calendar1 = "2013", 
               calendar2 = "2012",
               values = c("paid_loss_only", "claim_cts")
             )
kable(head(mychanges))

mychanges is a data frame consisting of all the claims in which there was a change in the paid_loss or claim_cts column from calendar period 2012 to 2013. You can now browse through the changed claims to spot obvious problems with the new data. For example we may want to check that there are no missing claims (i.e. no claims that were in the data at the last calendar that are no longer in the data)

# check for missing claims
kable(mychanges[mychanges$claim_cts_change < 0, ])

This check revealed that there are no missing claims in our loss_db from calendar 2012 to 2013.

We may also want to check if the paid_loss category decreased for any claims.

# check for claims in which paid_loss decreased
kable(mychanges[mychanges$paid_loss_only_change < 0, ])

There are a few claims with a decrease in "paid_loss_only". Claims should not decrease in gross paid loss as they develop, but it happens in real world loss data. Fortunately none of the paid amounts decreased so significantly that we need to stop our analysis and investigate. Next we can project some ultimate losses.

Create Projections

Before a projection is made we must specifiy the loss amounts we wish to project (e.g. paid loss & ALAE gross of all recoveries, paid loss & ALAE net of all recoveries, medical only paid loss & ALAE gross of all recoveries, etc.). Use the paid(), incurred(), paid_recovery(), and incurred_recovery() functions to get the total from each respective "dollar" category.

# project total paid losses gross of any recovery
value2project <- data.frame(origin = mydf$origin, 
                   dev = mydf$dev, 
                   paid_total = paid(mydf)
                 )
kable(head(value2project))

Now the ChainLadder package can be used to make projections.

library(ChainLadder)
paid_tri <- as.triangle(value2project, 
              origin = "origin", 
              dev = "dev", 
              value = "paid_total"
            )
MackChainLadder(paid_tri)
BootChainLadder(paid_tri)


merlinoa/lossdb documentation built on May 22, 2019, 6:52 p.m.