Description Usage Arguments See Also Examples
View source: R/data_anaylsis.R
get_iv_psi
is used to calculate Information Value (IV) and Population Stability Index (PSI) of an independent variable.
get_iv_psi_all
can loop through IV & PSI for all specified independent variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | get_psi_iv_all(
dat,
dat_test = NULL,
x_list = NULL,
target,
ex_cols = NULL,
pos_flag = NULL,
breaks_list = NULL,
occur_time = NULL,
oot_pct = 0.7,
equal_bins = FALSE,
cut_bin = "equal_depth",
tree_control = NULL,
bins_control = NULL,
bins_total = FALSE,
best = TRUE,
g = 10,
as_table = TRUE,
note = FALSE,
parallel = FALSE,
bins_no = TRUE
)
get_psi_iv(
dat,
dat_test = NULL,
x,
target,
pos_flag = NULL,
breaks = NULL,
breaks_list = NULL,
occur_time = NULL,
oot_pct = 0.7,
equal_bins = FALSE,
cut_bin = "equal_depth",
tree_control = NULL,
bins_control = NULL,
bins_total = FALSE,
best = TRUE,
g = 10,
as_table = TRUE,
note = FALSE,
bins_no = TRUE
)
|
dat |
A data.frame with independent variables and target variable. |
dat_test |
A data.frame of test data. Default is NULL. |
x_list |
Names of independent variables. |
target |
The name of target variable. |
ex_cols |
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |
pos_flag |
The value of positive class of target variable, default: "1". |
breaks_list |
A table containing a list of splitting points for each independent variable. Default is NULL. |
occur_time |
The name of the variable that represents the time at which each observation takes place. |
oot_pct |
Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7 |
equal_bins |
Logical, generates initial breaks for equal frequency or width binning. |
cut_bin |
A string, if equal_bins is TRUE, 'equal_depth' or 'equal_width', default is 'equal_depth'. |
tree_control |
Parameters of using Decision Tree to segment initial breaks. See detials: |
bins_control |
Parameters used to control binning. See detials: |
bins_total |
Logical, total sum for each variable. |
best |
Logical, merge initial breaks to get optimal breaks for binning. |
g |
Number of initial breakpoints for equal frequency binning. |
as_table |
Logical, output results in a table. Default is TRUE. |
note |
Logical, outputs info. Default is TRUE. |
parallel |
Logical, parallel computing. Default is FALSE. |
bins_no |
Logical, add serial numbers to bins. Default is FALSE. |
x |
The name of an independent variable. |
breaks |
Splitting points for an independent variable. Default is NULL. |
get_iv
,get_iv_all
,get_psi
,get_psi_all
1 2 3 4 5 | iv_list = get_psi_iv_all(dat = UCICreditCard[1:1000, ],
x_list = names(UCICreditCard)[3:5], equal_bins = TRUE,
target = "default.payment.next.month", ex_cols = "ID|apply_date")
get_psi_iv(UCICreditCard, x = "PAY_3",
target = "default.payment.next.month",bins_total = TRUE)
|
Package 'creditmodel' version 1.2.7
Feature bins cuts #total #expected expected_0 expected_1 #actual
1 PAY_3 00.NA -1 5938 4089 3449 640 1849
2 PAY_3 01.(-Inf,-1] 1 4085 2859 2324 535 1226
3 PAY_3 02.(-1,1] Inf 15768 11077 9167 1910 4691
4 PAY_3 03.(1, Inf] <NA> 4209 2975 1410 1565 1234
5 Total -- -- 30000 21000 16350 4650 9000
actual_0 actual_1 %total %expected %actual %total_1 %expected_1 %actual_1
1 1563 286 0.2 0.19 0.21 0.16 0.16 0.15
2 1004 222 0.14 0.14 0.14 0.19 0.19 0.18
3 3849 842 0.53 0.53 0.52 0.17 0.17 0.18
4 598 636 0.14 0.14 0.14 0.52 0.53 0.52
5 7014 1986 1 1 1 0.22 0.22 0.22
odds_ratio odds_ratio_s PSIi IVi
1 1.537 0 0.001 0.032
2 1.249 0.002 0 0.006
3 1.343 0.004 0 0.042
4 0.259 0 0 0.332
5 1 0.006 0.001 0.412
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.