library(lossdb) library(knitr) knitr::opts_chunk$set( comment = "#>", tidy = FALSE)
Warning: This vignette and package are still in the early stages of development...
Actuaries often work with data sets describing the dollar value and nature of insurance losses (loss data). It is common for actuaries to analyze loss data on a "by id" basis (i.e. each row/observation represents a claim/occurrence/member at a certain development stage) or on a "by origin" basis (i.e. each row/observation represents a policy/accident period at a certain development stage). If data is not provided by id or by origin it is transformed into one of the two formats for analysis.
I want to make reproducible actuarial reports in R. Getting raw data ready for analysis is a pain, and once the data is ready it is nice to have it in some kind of a standard format so you can do similar things to it whenever you get it into that format... enter lossdb
The goal of the lossdb
package is to provide standard functions for manipulating, visualizing, and modeling loss data. lossdb
provides a standard means of storing your loss data on a by id or by origin basis so you can use the same set of functions (defined by lossdb
) to automate many repetitive tasks and ultimately create reproducible reports that are generated directly from the original loss data.
Columns in loss data can be losely grouped into a few different categories. By organizing each column into one of these categories (defined below) certain actuarial analysis tasks can be automated across many loss data sets.
lossdb
groups all the loss data columns applicable to the analysis into one of 2 main categories. These two overarching categories are further subdivided into several more categories. A description of the lossdb
stucture is detailed below:
More detail on the lossdb structure is described below:
The 3 meta columns are of particular importance. They are defined as follows:
All columns representing dollar amounts are "dollar" columns. lossdb
assumes that all the information contained in dollar columns can be sub categorized into 4 different groups without losing any information. Each of these 4 groups can contain as many columns as neccessary. All dollar groups are optional, but you need to provide at least one column for one of the groups. The columns must be numeric and you can have as many columns in each group as desired. The 4 groups:
Other columns included in loss data provide some type of description of the claim (e.g. claimant name, whether the claim is open or closed, etc.). I refer to all of these columns as "desc" (short for description) columns. The lossdb
package can handle any number of description columns, they are optional, and they can be of any type.
The example will proceed as follows:
occurrences
) and transform that data set into a loss_df
object that contains and organizes all the information relevant to the analysis.loss_df
for errors and potential problem areas.loss_df
using the ChainLadder
package.View the structure of the occurrences
data frame using the str
function:
str(occurrences)
Now we can create the loss_df
object.
# create loss_df object mydf <- loss_df(occurrences, id = "claim_number", origin = "origin", dev = "dev", paid = c("paid_loss_only", "paid_expense"), incurred = c("incurred_loss_only", "incurred_expense"), paid_recovery = c("paid_excess250", "sal_sub"), incurred_recovery = c("incurred_excess250", "sal_sub_incurred"), desc = "claim_cts" ) kable(head(mydf[, 1:6]))
Each detail (dollar or desc) column has an attribute specifying the type of loss detail that the column contains. This attribute is named the "detail" attribute. The detail attribute of each column is defined by the argument the column is supplied to in the loss_df()
function (i.e. paid_loss
and paid_expense
have a detail attribute of "paid"). All detail columns maintain the column names that they are supplied with. The names for meta
columns are changed to the meta
category to which they were supplied.
Now we can use the lossdb
package to review the data. Let's start by seeing a summary of the most recent calendar
period (calendar
= origin
+ dev
) summarized by origin
period.
kable(summary(mydf)[, 1:9])
We can look at the data at an older calendar
period by specifying the calendar
argument in the summary()
function.
kable(summary(mydf, calendar = "2012")[, 1:9])
Note: the calendar
period is the origin
period plus the dev
. (e.g. The calendar
for all claims in origin year 2010 at their first calendar
period would be 2011.)
and the built in bar chart representation of the data...
plot(mydf)
and plotted at an alternative calendar
plot(mydf, calendar = "2012")
We can return a data frame of all the claims that have experienced a change from one calendar to another by using the claim_changes()
function:
# specify the loss amount values you want to see the changed claims for mychanges <- claim_changes(mydf, calendar1 = "2013", calendar2 = "2012", values = c("paid_loss_only", "claim_cts") ) kable(head(mychanges))
mychanges
is a data frame consisting of all the claims in which there was a change in the paid_loss
or claim_cts
column from calendar period 2012 to 2013. You can now browse through the changed claims to spot obvious problems with the new data. For example we may want to check that there are no missing claims (i.e. no claims that were in the data at the last calendar
that are no longer in the data)
# check for missing claims kable(mychanges[mychanges$claim_cts_change < 0, ])
This check revealed that there are no missing claims in our loss_db
from calendar 2012 to 2013.
We may also want to check if the paid_loss
category decreased for any claims.
# check for claims in which paid_loss decreased kable(mychanges[mychanges$paid_loss_only_change < 0, ])
There are a few claims with a decrease in "paid_loss_only". Claims should not decrease in gross paid loss as they develop, but it happens in real world loss data. Fortunately none of the paid amounts decreased so significantly that we need to stop our analysis and investigate. Next we can project some ultimate losses.
Before a projection is made we must specifiy the loss amounts we wish to project (e.g. paid loss & ALAE gross of all recoveries, paid loss & ALAE net of all recoveries, medical only paid loss & ALAE gross of all recoveries, etc.). Use the paid()
, incurred()
, paid_recovery()
, and incurred_recovery()
functions to get the total from each respective "dollar" category.
# project total paid losses gross of any recovery value2project <- data.frame(origin = mydf$origin, dev = mydf$dev, paid_total = paid(mydf) ) kable(head(value2project))
Now the ChainLadder
package can be used to make projections.
library(ChainLadder) paid_tri <- as.triangle(value2project, origin = "origin", dev = "dev", value = "paid_total" )
MackChainLadder(paid_tri)
BootChainLadder(paid_tri)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.