setup: Setup your modeling workflow
In realgabon/insuRglm: Tools for GLM Modeling in Insurance Context

setup

R Documentation

Setup your modeling workflow

Description

Creates a setup object that is the basis for any insuRglm modeling workflow. This object is subsequently used as a main input in most functions in the package.

Usage

setup(
  data_train,
  data_test = NULL,
  target,
  weight = NULL,
  offset = NULL,
  family = c("poisson", "gamma", "tweedie"),
  tweedie_p = NULL,
  simple_factors = NULL,
  keep_cols = NULL,
  glm_backend = c("speedglm", "stats"),
  folder = getwd(),
  load_file_nm = NULL,
  save_file_nm = NULL,
  seed = NULL
)

Arguments

`data_train`	Dataframe. Training data
`data_test`	Dataframe. Test data
`target`	Character scalar. Name of the target variable
`weight`	Character scalar. Name of the weight variable
`family`	Character scalar. Name of distribution family. One of `poisson`, `tweedie` or `gamma`
`tweedie_p`	Numeric scalar. Tweedie variance power, if family `tweedie` is used
`simple_factors`	Character vector. Names of potential predictors. These predictors need to be `factor` class.
`keep_cols`	Character vector. Names of columns that are not potential predictors, but should be kept in data.
`glm_backend`	Character scalar. Either 'speedglm' or 'stats'. Choosing 'speedglm' results in using `speedglm::speedglm` as glm backend, while choosing 'stats' will result in traditional `stats::glm`.
`folder`	Character scalar. Path to an existing folder where setup/model files will be stored.
`load_file_nm`	Character scalar. Filename of an existing setup object created by running setup. Must be within folder specified by `folder`. Can be without the '_setup.rds' suffix.
`save_file_nm`	Character scalar. Filename of a setup object saved during this run of the setup function. Will be saved within the folder specified by `folder`,
`seed`	Numeric scalar. Seed for reproducible random number generation, e.g. for creating CV folds.
`offset.`	Character scalar. Name of the offset variable, applicable for `poisson` family

Value

List of class setup. Contains attributes and objects used by other functions in the package.

Note

Short summary of the train/test datasets is written to the console

Examples

require(dplyr) # for the pipe operator#'

# poisson distribution target
data('freq_train')

setup <- setup(
  data_train = freq_train,
  target = 'freq',
  offset = 'exposure',
  family = 'poisson',
  keep_cols = c('pol_nbr', 'premium')
)

# gamma distribution target
data('sev_train')

setup <- setup(
  data_train = sev_train,
  target = 'sev',
  weight = 'numclaims',
  family = 'gamma',
  keep_cols = c('pol_nbr', 'exposure', 'premium')
)

# tweedie distribution - burning cost
data('bc_train')

setup <- setup(
  data_train = bc_train,
  target = 'bc',
  weight = 'exposure',
  family = 'tweedie',
  tweedie_p = 1.75, # use tweedie::tweedie.profile to determine the best value
  keep_cols = c('pol_nbr', 'premium')
)

# tweedie distribution - loss ratio
data('lr_train')

setup <- setup(
  data_train = lr_train,
  target = 'lr',
  weight = 'premium',
  family = 'tweedie',
  tweedie_p = 1.75, # use tweedie::tweedie.profile to determine the best value
  keep_cols = c('pol_nbr', 'exposure')
)

realgabon/insuRglm documentation built on Jan. 2, 2023, 2:51 a.m.