raw - R Actuarial Workshops

This is a packge which stores data used in R workshops sponsored by the Casualty Actuarial Society[^CAS_CYA]. This short vignette will describe the various data sets and give examples of their use. As well, there is a short note about each of the packages which we suggest be installed.

[^CAS_CYA]:The CAS has neither produced nor endorsed the content of this package.



Once the raw package has been installed, any of the data sets may be accessed. To get a list of the data sets available in any package in R, one may call the data function with the name of the package as an argument.

data(package = "raw")

To get help about any data set, one may use R's help facility to display the documentation for that data set.



There are four sets in all. Note that the numbering begins at 2. So far as I am aware, the data for the first COTOR challenge is not available from the CAS website. Each data set has a set of randomly generated non-life insurance claims.


This is data taken from Appendix A of the Basic Ratemaking study note by Werner and Modlin. This is a suite of data - six sets in all - pertaining to personal auto. More information may be found in the note itself.



This contains basic data


hist(Hurricane$Wind, xlab = "Wind Speed (knots)", main = "")

Region and State Experience

This are simulated data sets.


This data represents ten complete years of Schedule P development from many NAIC reporting companies. The data was prepared by Glenn Meyers and Peng Shi and is available from the CAS.

Suggested Packages

When installed using the command below, raw will also ensure that a useful suite of packages is installed.

install.packages("raw", dependencies = "Suggests")

Actuarial packages


This contains a varied set of useful actuarial tools, from loss distributions to credibility to compound loss models. This also contains the data from the "Loss Models" textbook by Klugman, Panjer and Wilmot. Read more here: https://cran.r-project.org/package=actuar.


ChainLadder supports the standard suite of loss reserving methods. A quick overview may be found here: https://github.com/mages/ChainLadder.


The mondate package enables one to calculate time differences in terms of months. Trust me, you need this.

endOfQuarter <- mondate("2010-03-31")
mondate::add(endOfQuarter, 3, "months")


Contains functions for present value, annuities, internal rate of return and more. Website here: http://felixfan.github.io/FinCal/


At the time of writing, this package has been removed from the list of suggested packages that ships with raw. This is due to a potential installation issue that could have affected its acceptance when submitting to CRAN. Despite that, I can strongly suggest that actuaries install this package on their own. To do so, simply type the command below.


The rstan package is the R implementation of the Stan MCMC project. This is an incredibly powerful and easy to use Bayesian framework. No, really, it is. If you're into Bayesian computation, this is essential. If you're not a Bayesian, this package will make you one. There are piles of material on the Stan website https://mc-stan.org/users/interfaces/rstan. The tutorials are easy to follow and worth your time.


This package supports hierarchical linear modelling. What's hierarchical linear modelling? It's basically a credibility method that's very popular in the social sciences and other non-actuarial fields. It should be popular everywhere.

The tidyverse


Hadley Wickham is an incredibly prolific and popular programmer who has made some big contributions to the R landscape.


I use the dplyr package on a daily basis. It has a simple, easy to understand grammar for data manipulation and summarization. Read the introduction here: https://github.com/tidyverse/dplyr.


The readr package has a few nice improvements to the basic functions for reading in flat data files. Among other things, it makes it easier to specify column data types and will give useful warnings when it encounters problematic data cells. Read more here: https://github.com/tidyverse/readr


There are a number of packages to read and write Excel files (xlsx and XLConnect are two). However, most use a java virtual machine to access the files and can run into memory issues. readxl doesn't.


tidyr is a light but useful set of functions to manipulate data. You'll most often see the spread and gather functions used to transform a data set between "long" and "wide" formats. There's a short intro here: https://blog.rstudio.com/2014/07/22/introducing-tidyr/.


This is a very powerful, flexible graphing engine. Quite a few books have been written about it, which is testament to both its complexity and utility.


Lubridate has a wealth of functions for manipulating dates and time intervals.

myDate <- mdy("2/16/1972")
year(myDate) <- 2016


Scales has some functions to render numbers in pretty formats.



This package is likely not all that useful to beginners, but it does have one very useful function: install_github. This will allow you to intall the development version of packages from the GitHub codesharing site. For example, the command below will ensure that you're always up to date with this package.




The knitr package, written by Yihue Xie, enables a user to combine R code and documentation in a single file. That file may then be converted into a format like PDF, HTML or Word.


This package enhances knitr to make the document conversion described above a bit easier. This is an incredibly powerful framework. Read more here: https://rmarkdown.rstudio.com/.

maps and maptools

As their names suggest, these packages support drawing maps.

Try the raw package in your browser

Any scripts or data that you put into this service are public.

raw documentation built on Feb. 5, 2021, 5:07 p.m.