iv: Information Value

Description Usage Arguments Details Value Examples

View source: R/info_value.R

Description

This function calculates information value (IV) for multiple x variables.

Usage

1
iv(dt, y, x = NULL, positive = "bad|1", order = TRUE)

Arguments

dt

A data frame with both x (predictor/feature) and y (response/label) variables.

y

Name of y variable.

x

Name of x variables. Default is NULL. If x is NULL, then all variables except y are counted as x variables.

positive

Value of positive class, default is "bad|1".

order

Logical, default is TRUE. If it is TRUE, the output will descending order via iv.

Details

IV is a very useful concept for variable selection while developing credit scorecards. The formula for information value is shown below:

IV = ∑(DistributionBad_{i} - DistributionGood_{i})*\ln(\frac{DistributionBad_{i}}{DistributionGood_{i}}).

The log component in information value is defined as weight of evidence (WOE), which is shown as

WeightofEvidence = \ln(\frac{DistributionBad_{i}}{DistributionGood_{i}}).

The relationship between information value and predictive power is as follows:

Information Value Predictive Power
----------------- ----------------
< 0.02 useless for prediction
0.02 to 0.1 Weak predictor
0.1 to 0.3 Medium predictor
> 0.3 Strong predictor

Value

Information Value

Examples

1
2
3
4
5
6
7
# Load German credit data
data(germancredit)

# information values
info_value = iv(germancredit, y = "creditability")

str(info_value)

scorecard documentation built on Sept. 11, 2018, 9:03 a.m.