training_model: Training model
In creditmodel: Toolkit for Credit Modeling, Analysis and Visualization

Description Usage Arguments Value See Also Examples

training_model Model builder

training_model(
  model_name = "mymodel",
  dat,
  dat_test = NULL,
  target = NULL,
  occur_time = NULL,
  obs_id = NULL,
  x_list = NULL,
  ex_cols = NULL,
  pos_flag = NULL,
  prop = 0.7,
  split_type = if (!is.null(occur_time)) "OOT" else "Random",
  preproc = TRUE,
  low_var = 0.99,
  missing_rate = 0.98,
  merge_cat = 30,
  remove_dup = TRUE,
  outlier_proc = TRUE,
  missing_proc = "median",
  default_miss = list(-1, "missing"),
  miss_values = NULL,
  one_hot = FALSE,
  trans_log = FALSE,
  feature_filter = list(filter = c("IV", "PSI", "COR", "XGB"), iv_cp = 0.02, psi_cp =
    0.1, xgb_cp = 0, cv_folds = 1, hopper = FALSE),
  algorithm = list("LR", "XGB", "GBM", "RF"),
  LR.params = lr_params(),
  XGB.params = xgb_params(),
  GBM.params = gbm_params(),
  RF.params = rf_params(),
  breaks_list = NULL,
  parallel = FALSE,
  cores_num = NULL,
  save_pmml = FALSE,
  plot_show = FALSE,
  vars_plot = TRUE,
  model_path = tempdir(),
  seed = 46,
  ...
)

`model_name`	A string, name of the project. Default is "mymodel"
`dat`	A data.frame with independent variables and target variable.
`dat_test`	A data.frame of test data. Default is NULL.
`target`	The name of target variable.
`occur_time`	The name of the variable that represents the time at which each observation takes place.Default is NULL.
`obs_id`	The name of ID of observations or key variable of data. Default is NULL.
`x_list`	Names of independent variables. Default is NULL.
`ex_cols`	Names of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.
`pos_flag`	The value of positive class of target variable, default: "1".
`prop`	Percentage of train-data after the partition. Default: 0.7.
`split_type`	Methods for partition. See details at : `train_test_split`.
`preproc`	Logical. Preprocess data. Default is TRUE.
`low_var`	Logical, delete low variance variables or not. Default is TRUE.
`missing_rate`	The maximum percent of missing values for recoding values to missing and non_missing.
`merge_cat`	merge categories of character variables that is more than m.
`remove_dup`	Logical, if TRUE, remove the duplicated observations.
`outlier_proc`	Logical, process outliers or not. Default is TRUE.
`missing_proc`	If logical, process missing values or not. If "median", then Nas imputation with k neighbors median. If "avg_dist", the distance weighted average method is applied to determine the NAs imputation with k neighbors. If "default", assigning the missing values to -1 or "missing", otherwise ,processing the missing values according to the results of missing analysis.
`default_miss`	Default value of missing data imputation, Defualt is list(-1,'missing').
`miss_values`	Other extreme value might be used to represent missing values, e.g: -9999, -9998. These miss_values will be encoded to -1 or "missing".
`one_hot`	Logical. If TRUE, one-hot_encoding of category variables. Default is FASLE.
`trans_log`	Logical, Logarithmic transformation. Default is FALSE.
`feature_filter`	Parameters for selecting important and stable features.See details at: `feature_selector`
`algorithm`	Algorithms for training a model. list("LR", "XGB", "GBDT", "RF") are available.
`LR.params`	Parameters of logistic regression & scorecard. See details at : `lr_params`.
`XGB.params`	Parameters of xgboost. See details at : `xgb_params`.
`GBM.params`	Parameters of GBM. See details at : `gbm_params`.
`RF.params`	Parameters of Random Forest. See details at : `rf_params`.
`breaks_list`	A table containing a list of splitting points for each independent variable. Default is NULL.
`parallel`	Default is FALSE.
`cores_num`	The number of CPU cores to use.
`save_pmml`	Logical, save model in PMML format. Default is TRUE.
`plot_show`	Logical, show model performance in current graphic device. Default is FALSE.
`vars_plot`	Logical, if TRUE, plot distribution ,correlation or partial dependence of model input variables . Default is TRUE.
`model_path`	The path for periodically saved data file. Default is `tempdir()`.
`seed`	Random number seed. Default is 46.
`...`	Other parameters.

A list containing Model Objects.

train_test_split,data_cleansing, feature_selector, lr_params, xgb_params, gbm_params, rf_params,fast_high_cor_filter,get_breaks_all,lasso_filter, woe_trans_all, get_logistic_coef, score_transfer,get_score_card, model_key_index,ks_psi_plot,ks_table_plot

sub = cv_split(UCICreditCard, k = 30)[[1]]
dat = UCICreditCard[sub,]
x_list = c("LIMIT_BAL")
B_model = training_model(dat = dat,
                         model_name = "UCICreditCard",
                         target = "default.payment.next.month",
							x_list = x_list,
                         occur_time =NULL,
                         obs_id =NULL,
							dat_test = NULL,
                         preproc = FALSE,
                         outlier_proc = FALSE,
                         missing_proc = FALSE,
                         feature_filter = NULL,
                         algorithm = list("LR"),
                         LR.params = lr_params(lasso = FALSE,
                                               step_wise = FALSE,
                                                 score_card = FALSE),
                         breaks_list = NULL,
                         parallel = FALSE,
                         cores_num = NULL,
                         save_pmml = FALSE,
                         plot_show = FALSE,
                         vars_plot = FALSE,
                         model_path = tempdir(),
                         seed = 46)