perf_psi: PSI

perf_psiR Documentation

PSI

Description

perf_psi calculates population stability index (PSI) for total credit score and Characteristic Stability Index (CSI) for variables. It can also creates graphics to display score distribution and positive rate trends.

Usage

perf_psi(score, label = NULL, title = NULL, show_plot = TRUE,
  positive = "bad|1", threshold_variable = 20, var_skip = NULL, ...)

Arguments

score

A list of credit score for actual and expected data samples. For example, score = list(expect = scoreE, actual = scoreA).

label

A list of label value for actual and expected data samples. For example, label = list(expect = labelE, actual = labelA). Defaults to NULL.

title

Title of plot, Defaults to NULL.

show_plot

Logical. Defaults to TRUE.

positive

Value of positive class, Defaults to "bad|1".

threshold_variable

Integer. Defaults to 20. If the number of unique values > threshold_variable, the provided score will be counted as total credit score, otherwise, it is variable score.

var_skip

Name of variables that are not score, such as id column. It should be the same with the var_kp in scorecard_ply function. Defaults to NULL.

...

Additional parameters.

Details

The population stability index (PSI) formula is displayed below:

PSI = \sum((Actual\% - Expected\%)*(\ln(\frac{Actual\%}{Expected\%}))).

The rule of thumb for the PSI is as follows: Less than 0.1 inference insignificant change, no action required; 0.1 - 0.25 inference some minor change, check other scorecard monitoring metrics; Greater than 0.25 inference major shift in population, need to delve deeper.

Characteristic Stability Index (CSI) formula is displayed below:

CSI = \sum((Actual\% - Expected\%)*score).

Value

A data frame of psi and graphics of credit score distribution

See Also

perf_eva gains_table

Examples


# load germancredit data
data("germancredit")
# filter variable via missing rate, iv, identical value rate
dtvf = var_filter(germancredit, "creditability")

# breaking dt into train and test
dt_list = split_df(dtvf, "creditability")
label_list = lapply(dt_list, function(x) x$creditability)

# binning
bins = woebin(dt_list$train, "creditability")
# scorecard
card = scorecard2(bins, dt = dt_list$train, y = 'creditability')

# credit score
score_list = lapply(dt_list, function(x) scorecard_ply(x, card))
# credit score, only_total_score = FALSE
score_list2 = lapply(dt_list, function(x) scorecard_ply(x, card,
  only_total_score=FALSE))


###### perf_psi examples ######
# Example I # only total psi
psi1 = perf_psi(score = score_list, label = label_list)
psi1$psi  # psi data frame
psi1$pic  # pic of score distribution
# modify colors
# perf_psi(score = score_list, label = label_list,
#          line_color='#FC8D59', bar_color=c('#FFFFBF', '#99D594'))

# Example II # both total and variable psi
psi2 = perf_psi(score = score_list2, label = label_list)
# psi2$psi  # psi data frame
# psi2$pic  # pic of score distribution



ShichenXie/scorecard documentation built on April 17, 2024, 8:55 p.m.