setup: Setup your modeling workflow

View source: R/setup.R

setupR Documentation

Setup your modeling workflow

Description

Creates a setup object that is the basis for any insuRglm modeling workflow. This object is subsequently used as a main input in most functions in the package.

Usage

setup(
  data_train,
  data_test = NULL,
  target,
  weight = NULL,
  offset = NULL,
  family = c("poisson", "gamma", "tweedie"),
  tweedie_p = NULL,
  simple_factors = NULL,
  keep_cols = NULL,
  glm_backend = c("speedglm", "stats"),
  folder = getwd(),
  load_file_nm = NULL,
  save_file_nm = NULL,
  seed = NULL
)

Arguments

data_train

Dataframe. Training data

data_test

Dataframe. Test data

target

Character scalar. Name of the target variable

weight

Character scalar. Name of the weight variable

family

Character scalar. Name of distribution family. One of poisson, tweedie or gamma

tweedie_p

Numeric scalar. Tweedie variance power, if family tweedie is used

simple_factors

Character vector. Names of potential predictors. These predictors need to be factor class.

keep_cols

Character vector. Names of columns that are not potential predictors, but should be kept in data.

glm_backend

Character scalar. Either 'speedglm' or 'stats'. Choosing 'speedglm' results in using speedglm::speedglm as glm backend, while choosing 'stats' will result in traditional stats::glm.

folder

Character scalar. Path to an existing folder where setup/model files will be stored.

load_file_nm

Character scalar. Filename of an existing setup object created by running setup. Must be within folder specified by folder. Can be without the '_setup.rds' suffix.

save_file_nm

Character scalar. Filename of a setup object saved during this run of the setup function. Will be saved within the folder specified by folder,

seed

Numeric scalar. Seed for reproducible random number generation, e.g. for creating CV folds.

offset.

Character scalar. Name of the offset variable, applicable for poisson family

Value

List of class setup. Contains attributes and objects used by other functions in the package.

Note

Short summary of the train/test datasets is written to the console

Examples

require(dplyr) # for the pipe operator#'

# poisson distribution target
data('freq_train')

setup <- setup(
  data_train = freq_train,
  target = 'freq',
  offset = 'exposure',
  family = 'poisson',
  keep_cols = c('pol_nbr', 'premium')
)

# gamma distribution target
data('sev_train')

setup <- setup(
  data_train = sev_train,
  target = 'sev',
  weight = 'numclaims',
  family = 'gamma',
  keep_cols = c('pol_nbr', 'exposure', 'premium')
)

# tweedie distribution - burning cost
data('bc_train')

setup <- setup(
  data_train = bc_train,
  target = 'bc',
  weight = 'exposure',
  family = 'tweedie',
  tweedie_p = 1.75, # use tweedie::tweedie.profile to determine the best value
  keep_cols = c('pol_nbr', 'premium')
)

# tweedie distribution - loss ratio
data('lr_train')

setup <- setup(
  data_train = lr_train,
  target = 'lr',
  weight = 'premium',
  family = 'tweedie',
  tweedie_p = 1.75, # use tweedie::tweedie.profile to determine the best value
  keep_cols = c('pol_nbr', 'exposure')
)


realgabon/insuRglm documentation built on Jan. 2, 2023, 2:51 a.m.