Description Usage Arguments Value See Also Examples
View source: R/variable_selection.R
psi_iv_filter  is for selecting important and stable features using IV & PSI.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | psi_iv_filter(
  dat,
  dat_test = NULL,
  target,
  x_list = NULL,
  breaks_list = NULL,
  pos_flag = NULL,
  ex_cols = NULL,
  occur_time = NULL,
  best = FALSE,
  equal_bins = TRUE,
  g = 10,
  sp_values = NULL,
  tree_control = list(p = 0.05, cp = 1e-06, xval = 5, maxdepth = 10),
  bins_control = list(bins_num = 10, bins_pct = 0.05, b_chi = 0.05, b_odds = 0.1, b_psi
    = 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.2, kc = 1),
  oot_pct = 0.7,
  psi_i = 0.1,
  iv_i = 0.01,
  cos_i = 0.7,
  vars_name = FALSE,
  note = TRUE,
  parallel = FALSE,
  save_data = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)
 | 
| dat | A data.frame with independent variables and target variable. | 
| dat_test | A data.frame of test data. Default is NULL. | 
| target | The name of target variable. | 
| x_list | Names of independent variables. | 
| breaks_list | A table containing a list of splitting points for each independent variable. Default is NULL. | 
| pos_flag | The value of positive class of target variable, default: "1". | 
| ex_cols | A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. | 
| occur_time | The name of the variable that represents the time at which each observation takes place. | 
| best | Logical, if TRUE, merge initial breaks to get optimal breaks for binning. | 
| equal_bins | Logical, if TRUE, equal sample size initial breaks generates.If FALSE , tree breaks generates using desison tree. | 
| g | Integer, number of initial bins for equal_bins. | 
| sp_values | A list of missing values. | 
| tree_control | the list of tree parameters. | 
| bins_control | the list of parameters. | 
| oot_pct | Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7 | 
| psi_i | The maximum threshold of PSI. 0 <= psi_i <=1; 0.05 to 0.2 usually work. Default: 0.1 | 
| iv_i | The minimum threshold of IV. 0 < iv_i ; 0.01 to 0.1 usually work. Default: 0.01 | 
| cos_i | cos_similarity of posive rate of train and test. 0.7 to 0.9 usually work.Default: 0.5. | 
| vars_name | Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE. | 
| note | Logical, outputs info. Default is TRUE. | 
| parallel | Logical, parallel computing. Default is FALSE. | 
| save_data | Logical, save results in locally specified folder. Default is FALSE. | 
| file_name | The name for periodically saved results files. Default is "Feature_importance_IV_PSI". | 
| dir_path | The path for periodically saved results files. Default is tempdir(). | 
| ... | Other parameters. | 
A list with the following elements:
Feature Selected variables.
IV IV of variables.
PSI PSI of variables.
COS cos_similarity of posive rate of train and test.
xgb_filter, gbm_filter, feature_selector
| 1 2 3 4 | psi_iv_filter(dat= UCICreditCard[1:1000,c(2,4,8:9,26)],
             target = "default.payment.next.month",
             occur_time = "apply_date",
             parallel = FALSE)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.