xgb_filter: Select Features using XGB

Description Usage Arguments Value See Also Examples

View source: R/variable_selection.R

Description

xgb_filter is for selecting important features using xgboost.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
xgb_filter(
  dat_train,
  dat_test = NULL,
  target = NULL,
  pos_flag = NULL,
  x_list = NULL,
  occur_time = NULL,
  ex_cols = NULL,
  xgb_params = list(nrounds = 100, max_depth = 6, eta = 0.1, min_child_weight = 1,
    subsample = 1, colsample_bytree = 1, gamma = 0, scale_pos_weight = 1,
    early_stopping_rounds = 10, objective = "binary:logistic"),
  f_eval = "auc",
  cv_folds = 1,
  cp = NULL,
  seed = 46,
  vars_name = TRUE,
  note = TRUE,
  save_data = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)

Arguments

dat_train

A data.frame with independent variables and target variable.

dat_test

A data.frame of test data. Default is NULL.

target

The name of target variable.

pos_flag

The value of positive class of target variable, default: "1".

x_list

Names of independent variables.

occur_time

The name of the variable that represents the time at which each observation takes place.

ex_cols

A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

xgb_params

Parameters of xgboost.The complete list of parameters is available at: http://xgboost.readthedocs.io/en/latest/parameter.html.

f_eval

Custimized evaluation function,"ks" & "auc" are available.

cv_folds

Number of cross-validations. Default: 5.

cp

Threshold of XGB feature's Gain. Default is 1/number of independent variables.

seed

Random number seed. Default is 46.

vars_name

Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE.

note

Logical, outputs info. Default is TRUE.

save_data

Logical, save results results in locally specified folder. Default is FALSE.

file_name

The name for periodically saved results files. Default is "Feature_importance_XGB".

dir_path

The path for periodically saved results files. Default is "./variable".

...

Other parameters to pass to xgb_params.

Value

Selected variables.

See Also

psi_iv_filter, gbm_filter, feature_selector

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
dat = UCICreditCard[1:1000,c(2,4,8:9,26)]
xgb_params = list(nrounds = 100, max_depth = 6, eta = 0.1,
                                       min_child_weight = 1, subsample = 1,
                                       colsample_bytree = 1, gamma = 0, scale_pos_weight = 1,
                                       early_stopping_rounds = 10,
                                       objective = "binary:logistic")
## Not run: 
xgb_features = xgb_filter(dat_train = dat, dat_test = NULL,
target = "default.payment.next.month", occur_time = "apply_date",f_eval = 'ks',
xgb_params = xgb_params,
cv_folds = 1, ex_cols = "ID$|date$|default.payment.next.month$", vars_name = FALSE)

## End(Not run)

creditmodel documentation built on Jan. 7, 2022, 5:06 p.m.