setup | R Documentation |
Creates a setup object that is the basis for any insuRglm modeling workflow. This object is subsequently used as a main input in most functions in the package.
setup( data_train, data_test = NULL, target, weight = NULL, offset = NULL, family = c("poisson", "gamma", "tweedie"), tweedie_p = NULL, simple_factors = NULL, keep_cols = NULL, glm_backend = c("speedglm", "stats"), folder = getwd(), load_file_nm = NULL, save_file_nm = NULL, seed = NULL )
data_train |
Dataframe. Training data |
data_test |
Dataframe. Test data |
target |
Character scalar. Name of the target variable |
weight |
Character scalar. Name of the weight variable |
family |
Character scalar. Name of distribution family. One of |
tweedie_p |
Numeric scalar. Tweedie variance power, if family |
simple_factors |
Character vector. Names of potential predictors. These predictors need to be |
keep_cols |
Character vector. Names of columns that are not potential predictors, but should be kept in data. |
glm_backend |
Character scalar. Either 'speedglm' or 'stats'. Choosing 'speedglm' results in using
|
folder |
Character scalar. Path to an existing folder where setup/model files will be stored. |
load_file_nm |
Character scalar. Filename of an existing setup object created by running setup.
Must be within folder specified by |
save_file_nm |
Character scalar. Filename of a setup object saved during this run of the setup function.
Will be saved within the folder specified by |
seed |
Numeric scalar. Seed for reproducible random number generation, e.g. for creating CV folds. |
offset. |
Character scalar. Name of the offset variable, applicable for |
List of class setup
. Contains attributes and objects used by other functions in the package.
Short summary of the train/test datasets is written to the console
require(dplyr) # for the pipe operator#' # poisson distribution target data('freq_train') setup <- setup( data_train = freq_train, target = 'freq', offset = 'exposure', family = 'poisson', keep_cols = c('pol_nbr', 'premium') ) # gamma distribution target data('sev_train') setup <- setup( data_train = sev_train, target = 'sev', weight = 'numclaims', family = 'gamma', keep_cols = c('pol_nbr', 'exposure', 'premium') ) # tweedie distribution - burning cost data('bc_train') setup <- setup( data_train = bc_train, target = 'bc', weight = 'exposure', family = 'tweedie', tweedie_p = 1.75, # use tweedie::tweedie.profile to determine the best value keep_cols = c('pol_nbr', 'premium') ) # tweedie distribution - loss ratio data('lr_train') setup <- setup( data_train = lr_train, target = 'lr', weight = 'premium', family = 'tweedie', tweedie_p = 1.75, # use tweedie::tweedie.profile to determine the best value keep_cols = c('pol_nbr', 'exposure') )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.