knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) actxps:::set_actxps_plot_theme()
This article walks through an example of creating a transaction study using the actxps package. Unlike a termination study, transaction studies track events that can occur multiple times over the life of a policy. Often, transactions are expected to reoccur; for example, the utilization of a guaranteed income stream.
Key questions to answer in a transaction study are:
The example below walks through preparing data by adding transaction information to a data frame with exposure-level records using the add_transactions()
function. Next, study results are summarized using the trx_stats()
function.
In this example, we'll be using the census_dat
, withdrawals
, and account_vals
data sets. Each data set is based on a theoretical block of deferred annuity business with a guaranteed lifetime income benefit.
census_dat
contains census-level information with one row per policywithdrawals
contains withdrawal transactions. There are 2 types of transactions in the data: "Base" (ordinary withdrawals) and "Rider" (guaranteed income payments).account_vals
contains historical account values on policy anniversaries. This data will be used to calculate withdrawal rates as a percentage of account values.add_transactions()
functionThe add_transactions()
function attaches transactions to a data frame with exposure-level records. This data frame must have the class exposed_df
. For our example, we first need to convert census_dat
into exposure records using the expose()
function.^[See vignette('exposures')
for more information on creating exposed_df
objects.] This example will use policy year exposures.
library(actxps) library(dplyr) exposed_data <- expose_py(census_dat, "2019-12-31", target_status = "Surrender") exposed_data
The withdrawals
data has 4 columns that are required for attaching transactions:
pol_num
: policy numbertrx_date
: transaction datetrx_type
: transaction typetrx_amt
: transaction amountwithdrawals
The grain of this data is one row per policy per transaction. The expectation is that the number of records in the transaction data will not match the number of rows in the exposure data. That is because policies could have zero or several transactions in a given exposure period.
The add_transactions()
function uses a non-equivalent join to associate each transaction with a policy number and a date interval found in the exposure data. Then, transaction counts and amounts are summarized such that there is one row per exposure period. In the event there are multiple transaction types found in the data, separate columns are created for each transaction type.
Using our example, we pass both the exposure and withdrawals data to add_transactions()
. The resulting data has the same number of rows as original exposure data and 4 new columns:
trx_amt_Base
: the sum of "Base" withdrawal transactionstrx_amt_Rider
: the sum of "Rider" withdrawal transactionstrx_n_Base
: the number of "Base" withdrawal transactionstrx_n_Rider
: the number of "Rider" withdrawal transactionsexposed_trx <- add_transactions(exposed_data, withdrawals) glimpse(exposed_trx)
If we print exposed_trx
, we can see that it is still an exposed_df
object, but now it has an additional attribute for transaction types that have been attached.
exposed_trx
trx_stats()
functionThe actxps package's workhorse function for summarizing transaction experience is trx_stats()
. This function returns a trx_df
object, which is a type of data frame containing additional attributes about the transaction study.
At a minimum, a trx_df
includes the following for each transaction type (trx_type
):
trx_n
)trx_flag
)trx_amt
)exposure
)avg_trx
)avg_all
)trx_freq = trx_n / trx_flag
)trx_util = trx_flag / exposure
)Optionally, a trx_df
can also include:
pct_of_*_w_trx
)pct_of_*_all
)To use trx_stats()
, we simply need to pass it an exposed_df
object with transactions attached.^[Unlike exp_stats()
, trx_stats()
requires data to be an exposed_df
object.]
trx_stats(exposed_trx)
The results show us that we specified no groups, which is why the output data contains a single row for each transaction type.
If the data frame passed into trx_stats()
is grouped using dplyr::group_by()
, the resulting output will contain one record for each transaction type for each unique group.
In the following, exposed_trx
is grouped by the presence of an income guarantee (inc_guar
) before being passed to trx_stats()
. This results in four rows because we have two types of transactions and two distinct values of inc_guar
.
exposed_trx |> group_by(inc_guar) |> trx_stats()
Multiple grouping variables are allowed. Below, policy year (pol_yr
) is added as a second grouping variable.
exposed_trx |> group_by(pol_yr, inc_guar) |> trx_stats()
In a transaction study, we often want to express transaction amounts as a percentage of another value. For example, in a withdrawal study, withdrawal amounts divided by account values provides a withdrawal rate. In a study of benefit utilization, transactions can be divided by a maximum benefit amount to derive a benefit utilization rate. In addition, actual-to-expected rates can be calculated by dividing transactions by expected values.
If column names are passed to the percent_of
argument of trx_stats()
, the output will contain 4 additional columns for each "percent of" variable:
_w_trx
.pct_of_{*}_all
)pct_of_{*}_w_trx
)For our example, let's assume we're interested in examining withdrawal transactions as a percentage of account values, which are available in the account_vals
data frame in the column av_anniv
.
# attach account values data exposed_trx_w_av <- exposed_trx |> left_join(account_vals, by = c("pol_num", "pol_date_yr")) trx_res <- exposed_trx_w_av |> group_by(pol_yr, inc_guar) |> trx_stats(percent_of = "av_anniv") glimpse(trx_res)
If conf_int
is set to TRUE
, trx_stats()
will produce lower and upper confidence interval limits for the observed utilization rate. Confidence intervals are constructed assuming a binomial distribution.
exposed_trx |> group_by(pol_yr) |> trx_stats(conf_int = TRUE) |> select(pol_yr, trx_util, trx_util_lower, trx_util_upper)
The default confidence level is 95%. This can be changed using the conf_level
argument. Below, tighter confidence intervals are constructed by decreasing the confidence level to 90%.
exposed_trx |> group_by(pol_yr) |> trx_stats(conf_int = TRUE, conf_level = 0.9) |> select(pol_yr, trx_util, trx_util_lower, trx_util_upper)
If any column names are passed to percent_of
, trx_stats()
will produce additional confidence intervals:
pct_of_{*}_w_trx
) are constructed using a normal distribution.pct_of_{*}_all
) are calculated assuming that the aggregate distribution is normal with a mean equal to observed transactions and a variance equal to:$$
Var(S) = E(N) \times Var(X) + E(X)^2 \times Var(N),
$$
Where S
is the aggregate transactions random variable, X
is an individual transaction amount assumed to follow a normal distribution, and N
is a binomial random variable for transaction utilization.
exposed_trx_w_av |> group_by(pol_yr) |> trx_stats(conf_int = TRUE, percent_of = "av_anniv") |> select(pol_yr, starts_with("pct_of")) |> glimpse()
autoplot()
and autotable()
The autoplot()
and autotable()
functions create visualizations and summary tables from trx_df
objects. See vignette("visualizations")
for full details on these functions.
library(ggplot2) trx_res |> # remove periods with zero transactions filter(trx_n > 0) |> autoplot(y = pct_of_av_anniv_w_trx)
trx_res |> # remove periods with zero transactions filter(trx_n > 0) |> # first 10 rows showed for brevity head(10) |> autotable()
The trx_types
argument of trx_stats()
selects a subset of transaction types that will appear in the output.
trx_stats(exposed_trx, trx_types = "Base")
If the combine_trx
argument is set to TRUE
, all transaction types will be combined in a group called "All" in the output.
trx_stats(exposed_trx, combine_trx = TRUE)
As a default, trx_stats()
removes partial exposures before summarizing results. This is done to avoid complexity associated with a lopsided skew in the timing of transactions. For example, if transactions can occur on a monthly basis or annually at the beginning of each policy year, partial exposures may not be appropriate. If a policy had an exposure of 0.5 years and was taking withdrawals annually at the beginning of the year, an argument could be made that the exposure should instead be 1 complete year. If the same policy was expected to take withdrawals 9 months into the year, it's not clear if the exposure should be 0.5 years or 0.5 / 0.75 years. To override this treatment, set the full_exposures_only
argument to FALSE
.
trx_stats(exposed_trx, full_exposures_only = FALSE)
As noted above, the result of trx_stats()
is a trx_df
object. If the summary()
function is applied to a trx_df
object, the data will be summarized again and return a higher level trx_df
object.
If no additional arguments are passed, summary()
returns a single row of aggregate results for each transaction type.
summary(trx_res)
If additional variable names are passed to the summary()
function, then the output will group the data by those variables. In our example, if pol_yr
is passed to summary()
, the output will contain one row per policy year for each transaction type.
summary(trx_res, pol_yr)
Similarly, if inc_guar
is passed to summary()
, the output will contain a row for each transaction type and unique value in inc_guar
.
summary(trx_res, inc_guar)
As a default, add_transactions()
assumes the transaction data frame (trx_data
) uses the following naming conventions:
pol_num
trx_date
trx_type
trx_amt
These default names can be overridden using the col_pol_num
, col_trx_date
, col_trx_type
, and col_trx_amt
arguments.
For example, if the transaction type column was called transaction_code
in our data, we could write:
exposed_data |> add_transactions(withdrawals, col_trx_type = "transaction_code")
Similarly, trx_stats()
assumes the input data uses the name exposure
for exposures. This default can be overridden using the argument col_exposure
.
The trx_stats()
function does not produce any calculations related to the persistence of transactions from exposure period to exposure period.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.