View source: R/IndividualDataPP.R
IndividualDataPP | R Documentation |
This function pre-processes the data for the application of a ReSurv
model.
IndividualDataPP(
data,
id = NULL,
continuous_features = NULL,
categorical_features = NULL,
accident_period,
calendar_period,
input_time_granularity = "months",
output_time_granularity = "quarters",
years = NULL,
calendar_period_extrapolation = FALSE,
continuous_features_spline = NULL,
degrees_cf = 3,
degrees_of_freedom_cf = 4,
degrees_cp = 3,
degrees_of_freedom_cp = 4
)
data |
|
id |
|
continuous_features |
|
categorical_features |
|
accident_period |
|
calendar_period |
|
input_time_granularity |
Default to |
output_time_granularity |
The output granularity must be bigger than the input granularity.
Also, the output granularity must be consistent with the input granularity, meaning that the time conversion must be possible.
E.g., it is possible to group quarters to years. It is not possible to group quarters to semesters.
Default to |
years |
|
calendar_period_extrapolation |
|
continuous_features_spline |
|
degrees_cf |
|
degrees_of_freedom_cf |
|
degrees_cp |
|
degrees_of_freedom_cp |
|
The input accident_period
is coded as AP_i
. The input development periods are derived as DP_i
=calendar_period
-accident_period
+1.
The reverse time development factors are DP_rev_i
= DP_max
-DP_i
, where DP_max
is the maximum number of development times: DP_i
=1,\ldots,
DP_max
. Given the parameter years
, DP_max
is derived internally from our package.
As for the truncation time, TR_i
= AP_i
-1.
AP_i
, DP_i
, DP_rev_i
and TR_i
are converted to AP_o
, DP_o
, DP_rev_o
and TR_o
(from the input_time_granularity
to the output_time_granularity
) using a multiplicative conversion factor. E.g., AP_o
= AP_i
* CF
.
The conversion factor is computed as
CF=\frac{{\nu}^i}{({\nu}^o)^{-1}}
,
where {\nu}^i
and {\nu}^o
are the fraction of a year corresponding to input_time_granularity
and output_time_granularity
. {\nu}^i
and {\nu}^o
take values 1/360, 1/12, 1/4, 1/2, 1
for "days", "months", "quarters", "semesters", "years"
respectively.
We will have RP_o
= AP_o
+ DP_o
.
IndividualDataPP
object. A list containing
full.data
: data.frame
. The input data after pre-processing.
starting.data
: data.frame
. The input data as they were provided from the user.
training.data
: data.frame
. The input data pre-processed for training.
conversion_factor
: numeric
. The conversion factor for going from input granularity to output granularity. E.g, the conversion factor for going from months to quarters is 1/3.
string_formula_i
: character
. The survival
formula to model the data in input granularity.
string_formula_o
: character
. The survival
formula to model the in data output granularity.
continuous_features
: character
. The continuous features names as provided from the user.
categorical_features
: character
. The categorical features names as provided from the user.
calendar_period_extrapolation
: logical
. The value specifying if a calendar period component is extrapolated.
years
: numeric
. Total number of development years in the data. Default is NULL and computed automatically from the data.
accident_period
: character
. Accident period column name.
calendar_period
: character
. Calendar_period column name.
input_time_granularity
: character
. Input time granularity.
output_time_granularity
: character
. Output time granularity.
After pre-processing, we provide a standard encoding for the time components. This regards the output in training.data
and full.data
.
In the ReSurv
notation:
AP_i
: Input granularity accident period.
AP_o
: Output granularity accident period.
DP_i
: Input granularity development period in forward time.
DP_rev_i
: Input granularity development period in reverse time.
DP_rev_o
: Output granularity development period in reverse time.
TR_i
: Input granularity truncation time.
TR_o
: Output granularity truncation time.
I
: event indicator, under this framework is equal to one for each entry.
Munir, H., Emil, H., & Gabriele, P. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.
input_data_0 <- data_generator(
random_seed = 1964,
scenario = "alpha",
time_unit = 1,
years = 2,
period_exposure = 100)
individual_data <- IndividualDataPP(data = input_data_0,
categorical_features = "claim_type",
continuous_features = "AP",
accident_period = "AP",
calendar_period = "RP",
input_time_granularity = "years",
output_time_granularity = "years",
years = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.