stepKS: stepKS

View source: R/stepKS.R

stepKSR Documentation

stepKS

Description

This function select the best pool of variables of logistic regression models based on KS metric and significative coefficients.

Usage

stepKS(
  data,
  y,
  sample_col,
  val_sample,
  train_sample = "DES",
  exclude = NULL,
  include = NULL,
  start = NULL,
  force_in_model = NULL,
  return_type = "none",
  sig_mode = "all",
  direction = "both",
  link = "logit",
  vars_enable_both = 5,
  near_sample = NULL,
  pct_ks_dif = 0.05,
  flag_bad = 1,
  steps_ahead = 5,
  max_cat = 10,
  ks_precision = FALSE,
  progress_bar = TRUE,
  show_time_elapsed = TRUE,
  ignore_intercept_sig = TRUE,
  p_value = 0.05,
  trim_chars = 2,
  subsample_columns = 1,
  subsample_rows = 1,
  debug = FALSE,
  debug2 = FALSE
)

Arguments

data

R data frame object.

y

Name of you target column.

sample_col

Name of you sample column.

val_sample

Name of you sample that func will validate the model.

train_sample

Name of you sample that func will train the model.

exclude

If you do not want to pass the name of the Data Frame variables to be analyzed by the algorithm, you also have to pass the list of data frame variables that are not analyzed, for example the key variables, and everything else will be analyzed.

include

List with name of variables to be analyzed by the algorithm.

start

List of variables that will already begin in the model.

force_in_model

List of variables that will already begin and force in the model.

return_type

"none","vars","model","formula","scored_data","ahead_vars" or "ahead_model".

sig_mode

'off', 'one_cat' or 'all'.

direction

"both", "forward" or "both_sophisticated".

link

Default is logit.

vars_enable_both

Number of vars in model to enble both method.

near_sample

Defines a second sample in which the algorithm will always keep close to the main sample. The sample will need to be in the Data Frame.

pct_ks_dif

"Defines how close the KS to the ks_comp need to be close to the ks_focus.

flag_bad

Default is 1.

steps_ahead

If in the current step when adding the best variable it does not get a KS higher than the KS of the previous step, this parameter indicates how many steps the algorithm will walk forward of the step with the best KS to try to find some variable that increases the KS.

max_cat

Number of max factors of vars.

ks_precision

If TRUE function will score your dataset before compute the KS Score, for ervery model tested.

progress_bar

If TRUE function will show a progress_bar in every step.

show_time_elapsed

If TRUE function will show the time_elapsed in every step.

ignore_intercept_sig

If TRUE function will ignore intercept significance.

p_value

p_value.

trim_chars

Number of chars of name vars the function will trim.

subsample_columns

Proportion of your data that function will run.,

subsample_rows

Proportion of your data that function will run.

debug

Debug.

debug2

Debug2.

Value

A 'glm' model with the best pool of vars.


jrgazola/stepKS documentation built on March 22, 2022, 12:06 a.m.