stepKS | R Documentation |
This function select the best pool of variables of logistic regression models based on KS metric and significative coefficients.
stepKS( data, y, sample_col, val_sample, train_sample = "DES", exclude = NULL, include = NULL, start = NULL, force_in_model = NULL, return_type = "none", sig_mode = "all", direction = "both", link = "logit", vars_enable_both = 5, near_sample = NULL, pct_ks_dif = 0.05, flag_bad = 1, steps_ahead = 5, max_cat = 10, ks_precision = FALSE, progress_bar = TRUE, show_time_elapsed = TRUE, ignore_intercept_sig = TRUE, p_value = 0.05, trim_chars = 2, subsample_columns = 1, subsample_rows = 1, debug = FALSE, debug2 = FALSE )
data |
R data frame object. |
y |
Name of you target column. |
sample_col |
Name of you sample column. |
val_sample |
Name of you sample that func will validate the model. |
train_sample |
Name of you sample that func will train the model. |
exclude |
If you do not want to pass the name of the Data Frame variables to be analyzed by the algorithm, you also have to pass the list of data frame variables that are not analyzed, for example the key variables, and everything else will be analyzed. |
include |
List with name of variables to be analyzed by the algorithm. |
start |
List of variables that will already begin in the model. |
force_in_model |
List of variables that will already begin and force in the model. |
return_type |
"none","vars","model","formula","scored_data","ahead_vars" or "ahead_model". |
sig_mode |
'off', 'one_cat' or 'all'. |
direction |
"both", "forward" or "both_sophisticated". |
link |
Default is logit. |
vars_enable_both |
Number of vars in model to enble both method. |
near_sample |
Defines a second sample in which the algorithm will always keep close to the main sample. The sample will need to be in the Data Frame. |
pct_ks_dif |
"Defines how close the KS to the ks_comp need to be close to the ks_focus. |
flag_bad |
Default is 1. |
steps_ahead |
If in the current step when adding the best variable it does not get a KS higher than the KS of the previous step, this parameter indicates how many steps the algorithm will walk forward of the step with the best KS to try to find some variable that increases the KS. |
max_cat |
Number of max factors of vars. |
ks_precision |
If TRUE function will score your dataset before compute the KS Score, for ervery model tested. |
progress_bar |
If TRUE function will show a progress_bar in every step. |
show_time_elapsed |
If TRUE function will show the time_elapsed in every step. |
ignore_intercept_sig |
If TRUE function will ignore intercept significance. |
p_value |
p_value. |
trim_chars |
Number of chars of name vars the function will trim. |
subsample_columns |
Proportion of your data that function will run., |
subsample_rows |
Proportion of your data that function will run. |
debug |
Debug. |
debug2 |
Debug2. |
A 'glm' model with the best pool of vars.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.