get_psi_all: Calculate Population Stability Index (PSI) 'get_psi' is used...

Description Usage Arguments Details See Also Examples

View source: R/data_anaylsis.R

Description

Calculate Population Stability Index (PSI) get_psi is used to calculate Population Stability Index (PSI) of an independent variable. get_psi_all can loop through PSI for all specified independent variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
get_psi_all(
  dat,
  x_list = NULL,
  target = NULL,
  dat_test = NULL,
  breaks_list = NULL,
  occur_time = NULL,
  start_date = NULL,
  cut_date = NULL,
  oot_pct = 0.7,
  pos_flag = NULL,
  parallel = FALSE,
  ex_cols = NULL,
  as_table = FALSE,
  g = 10,
  bins_no = TRUE,
  note = FALSE
)

get_psi(
  dat,
  x,
  target = NULL,
  dat_test = NULL,
  occur_time = NULL,
  start_date = NULL,
  cut_date = NULL,
  pos_flag = NULL,
  breaks = NULL,
  breaks_list = NULL,
  oot_pct = 0.7,
  g = 10,
  as_table = TRUE,
  note = FALSE,
  bins_no = TRUE
)

Arguments

dat

A data.frame with independent variables and target variable.

x_list

Names of independent variables.

target

The name of target variable.

dat_test

A data.frame of test data. Default is NULL.

breaks_list

A table containing a list of splitting points for each independent variable. Default is NULL.

occur_time

The name of the variable that represents the time at which each observation takes place.

start_date

The earliest occurrence time of observations.

cut_date

Time points for spliting data sets, e.g. : spliting Actual and Expected data sets.

oot_pct

Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7

pos_flag

Value of positive class, Default is "1".

parallel

Logical, parallel computing. Default is FALSE.

ex_cols

Names of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

as_table

Logical, output results in a table. Default is TRUE.

g

Number of initial breakpoints for equal frequency binning.

bins_no

Logical, add serial numbers to bins. Default is TRUE.

note

Logical, outputs info. Default is TRUE.

x

The name of an independent variable.

breaks

Splitting points for an independent variable. Default is NULL.

Details

PSI Rules for evaluating the stability of a predictor Less than 0.02: Very stable 0.02 to 0.1: Stable 0.1 to 0.2: Unstable 0.2 to 0.5] : Change more than 0.5: Great change

See Also

get_iv,get_iv_all,get_psi,get_psi_all

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#  dat_test is null
get_psi(dat = UCICreditCard, x = "PAY_3", occur_time = "apply_date")
# dat_test is not all
# train_test split
train_test = train_test_split(dat = UCICreditCard, prop = 0.7, split_type = "OOT",
                             occur_time = "apply_date", start_date = NULL, cut_date = NULL,
                            save_data = FALSE, note = FALSE)
dat_ex = train_test$train
dat_ac = train_test$test
# generate psi table
get_psi(dat = dat_ex, dat_test = dat_ac, x = "PAY_3",
       occur_time = "apply_date", bins_no = TRUE)

Example output

Package 'creditmodel' version 1.2.7
  Feature         Bins actual expected Ac_pct Ex_pct PSI_i PSI
1   PAY_3        00.NA   1825     4113  19.8%  19.8%     0   0
2   PAY_3 01.(-Inf,-2]   1238     2847  13.4%  13.7%     0   0
3   PAY_3    02.(-2,0]   4849    10915  52.6%  52.5%     0   0
4   PAY_3     03.(0,2]   1190     2633  12.9%  12.7%     0   0
5   PAY_3  04.(2, Inf]    116      274   1.3%   1.3%     0   0
  Feature         Bins actual expected Ac_pct Ex_pct PSI_i PSI
1   PAY_3        00.NA   1825     4113  19.8%  19.8%     0   0
2   PAY_3 01.(-Inf,-2]   1238     2847  13.4%  13.7%     0   0
3   PAY_3    02.(-2,0]   4849    10915  52.6%  52.5%     0   0
4   PAY_3     03.(0,2]   1190     2633  12.9%  12.7%     0   0
5   PAY_3  04.(2, Inf]    116      274   1.3%   1.3%     0   0

creditmodel documentation built on Jan. 7, 2022, 5:06 p.m.