niv: Adjusted Net Information Value

Description Usage Arguments Details Value Author(s) References Examples

View source: R/niv.R

Description

This function produces an adjusted net information value for each variable specified in the right hand side of the formula. This can be a helpful exploratory tool to (preliminary) determine the predictive power of each variable for uplift.

Usage

1
2
niv(formula, data, subset, na.action = na.pass, B = 10, direction = 1, 
nbins = 10, continuous = 4, plotit = TRUE, ...)

Arguments

formula

a formula expression of the form response ~ predictors. A special term of the form trt() must be used in the model equation to identify the binary treatment variable. For example, if the treatment is represented by a variable named treat, then the right hand side of the formula must include the term +trt(treat).

data

a data.frame in which to interpret the variables named in the formula.

subset

expression indicating which subset of the rows of data should be included. All observations are included by default.

na.action

a missing-data filter function. This is applied to the model.frame after any subset argument has been used. Default is na.action = na.pass.

B

the number of bootstrap samples used to compute the adjusted net information value.

direction

if set to 1 (default), the net weight of evidence is computed as the difference between the weight of evidence of the treatment and control groups, or if 2, it is computed as the difference between the weight of evidence of the control and treatment groups. This will not change the adjusted net information value, but only the sign of the net weight of evidence values.

nbins

the number of bins created from numeric predictors. The bins are created based on quantiles, with a default value of 10 (deciles).

continuous

specifies the threshold for when a variable is considered to be continuous (when there are at least continuous unique values). The default is 4. Factor variables are always considered to be categorical no matter how many levels they have.

plotit

plot the adjusted net information value for each variable?

...

additional arguments passed to barplot.

Details

The ordinary information value (commonly used in credit scoring applications) is given by

IV = ∑_{i=1}^{G} ≤ft (P(x=i|y=1) - P(x=i|y=0) \right) \times WOE_i

where G is the number of groups created from a numeric predictor or categories from a categorical predictor, and WOE_i = ln (\frac{P(x=i|y=1)}{P(x=i|y=0)}).

The net information value is the natural extension of the IV for the case of uplift. It is computed as

NIV = 100 \times ∑_{i=1}^{G}(P(x=i|y=1)^{T} \times P(x=i|y=0)^{C} - P(x=i|y=0)^{T} \times P(x=i|y=1)^{C}) \times NWOE_i

where NWOE_i = WOE_i^{T} - WOE_i^{C}

The adjusted net information value is computed as follows:

1. Take B bootstrap samples and compute the NIV for each variable on each sample

2. Compute the mean of the NIV (NIV_{mean}) and sd of the NIV (NIV_{sd}) for each variable over all the B bootstraps

3. The adjusted NIV for a given variable is computed by adding a penalty term to the mean NIV: NIV_{mean} - \frac{NIV_{sd}}{√{B}}.

Value

A list with two components:

niv_val

a matrix with the following columns: niv (the average net information value for each variable over all bootstrap samples), penalty (the penalty term calculated as described in the details above), the adjusted information value (the difference between the prior two colums)

nwoe

a list of matrices, one for each variable. The columns represent: the distribution of the responses (y=1) over the treated group (ct1.y1), the distribution of the non-responses (y=0) over the treated group (ct1.y0), the distribution of the responses (y=1) over the control group (ct0.y1), the distribution of the non-responses (y=0) over the control group (ct0.y0), the weight-of-evidence over the treated group (ct1.woe), the weight-of-evidence over the control group ct0.woe, and the net weigh-of-evidence (nwoe).

Author(s)

Leo Guelman <leo.guelman@gmail.com>

References

Larsen, K. (2009). Net lift models. In: M2009 - 12th Annual SAS Data Mining Conference.

Examples

1
2
3
4
5
6
7
8
9
library(uplift)

set.seed(12345)
dd <- sim_pte(n = 1000, p = 20, rho = 0, sigma =  sqrt(2), beta.den = 4)
dd$treat <- ifelse(dd$treat == 1, 1, 0) 

niv.1 <- niv(y ~ X1 + X2 + X3 + X4 + X5 + X6 + trt(treat), data = dd)            
niv.1$niv
niv.1$nwoe

Example output

Loading required package: RItools
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: MASS
Loading required package: coin
Loading required package: survival
Loading required package: tables
Loading required package: Hmisc
Loading required package: lattice
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, round.POSIXt, trunc.POSIXt, units

Loading required package: penalized
Welcome to penalized. For extended examples, see vignette("penalized").
      niv penalty adj_niv
X1 10.144  0.8253  9.3187
X2  9.593  0.7114  8.8816
X3  8.521  0.4916  8.0294
X4  7.469  0.5956  6.8734
X5  3.079  0.3371  2.7419
X6  3.108  0.6607  2.4473
$X1
                 ct1.y1 ct1.y0 ct0.y1 ct0.y0 ct1.woe ct0.woe    nwoe
[-2.96,-1.22]    0.0778 0.1391 0.1265 0.0607 -0.5816  0.7337 -1.3152
(-1.22,-0.776]   0.0778 0.0826 0.1462 0.0931 -0.0603  0.4514 -0.5117
(-0.776,-0.492]  0.0889 0.1174 0.1542 0.0405 -0.2781  1.3370 -1.6151
(-0.492,-0.217]  0.1074 0.0957 0.0949 0.1012  0.1159 -0.0648  0.1807
(-0.217,0.00272] 0.0852 0.1217 0.0672 0.1296 -0.3571 -0.6565  0.2995
(0.00272,0.223]  0.1111 0.1348 0.0751 0.0810 -0.1931 -0.0753 -0.1178
(0.223,0.5]      0.1037 0.0826 0.0949 0.1174  0.2274 -0.2132  0.4407
(0.5,0.783]      0.1259 0.0826 0.0870 0.1012  0.4216 -0.1518  0.5734
(0.783,1.21]     0.0889 0.0826 0.1028 0.1255  0.0733 -0.1999  0.2732
(1.21,2.96]      0.1333 0.0609 0.0514 0.1498  0.7841 -1.0700  1.8541

$X2
                ct1.y1 ct1.y0 ct0.y1 ct0.y0 ct1.woe ct0.woe    nwoe
[-3.33,-1.22]   0.1407 0.0609 0.0593 0.1336  0.8382 -0.8125  1.6506
(-1.22,-0.843]  0.1370 0.0826 0.0791 0.0972  0.5061 -0.2063  0.7125
(-0.843,-0.529] 0.0926 0.0913 0.0870 0.1296  0.0140 -0.3987  0.4127
(-0.529,-0.232] 0.1148 0.1043 0.1067 0.0729  0.0956  0.3815 -0.2859
(-0.232,-0.004] 0.1222 0.0783 0.1067 0.0891  0.4458  0.1808  0.2650
(-0.004,0.246]  0.1148 0.0870 0.0909 0.1053  0.2779 -0.1466  0.4245
(0.246,0.479]   0.0926 0.1130 0.0751 0.1215 -0.1996 -0.4808  0.2812
(0.479,0.79]    0.0556 0.1130 0.1225 0.1134 -0.7104  0.0778 -0.7882
(0.79,1.26]     0.0741 0.1174 0.1186 0.0931 -0.4604  0.2417 -0.7021
(1.26,3.6]      0.0556 0.1522 0.1542 0.0445 -1.0076  1.2417 -2.2493

$X3
                 ct1.y1 ct1.y0 ct0.y1 ct0.y0 ct1.woe ct0.woe    nwoe
[-3.4,-1.25]     0.0963 0.1174 0.1304 0.0567 -0.1981  0.8334 -1.0315
(-1.25,-0.841]   0.0852 0.1174 0.1304 0.0688 -0.3207  0.6393 -0.9600
(-0.841,-0.526]  0.1185 0.0783 0.1383 0.0607  0.4150  0.8233 -0.4083
(-0.526,-0.269]  0.0852 0.1261 0.1067 0.0850 -0.3921  0.2273 -0.6195
(-0.269,-0.0479] 0.0630 0.1304 0.1186 0.0931 -0.7283  0.2417 -0.9700
(-0.0479,0.2]    0.0926 0.1000 0.0988 0.1093 -0.0770 -0.1010  0.0240
(0.2,0.495]      0.1222 0.0826 0.0672 0.1255  0.3917 -0.6248  1.0165
(0.495,0.862]    0.1000 0.0913 0.0751 0.1336  0.0910 -0.5761  0.6670
(0.862,1.27]     0.1259 0.0783 0.0870 0.1053  0.4756 -0.1911  0.6667
(1.27,3.74]      0.1111 0.0783 0.0474 0.1619  0.3505 -1.2280  1.5785

$X4
                ct1.y1 ct1.y0 ct0.y1 ct0.y0 ct1.woe ct0.woe    nwoe
[-3.51,-1.32]   0.1037 0.1130 0.0751 0.1093 -0.0862 -0.3754  0.2892
(-1.32,-0.879]  0.1074 0.0870 0.0435 0.1619  0.2112 -1.3150  1.5262
(-0.879,-0.545] 0.0815 0.0870 0.0830 0.1498 -0.0650 -0.5904  0.5254
(-0.545,-0.26]  0.1259 0.0652 0.1146 0.0891  0.6580  0.2523  0.4057
(-0.26,-0.055]  0.1222 0.1130 0.0711 0.0931  0.0781 -0.2691  0.3472
(-0.055,0.237]  0.1111 0.1000 0.1107 0.0769  0.1054  0.3638 -0.2584
(0.237,0.553]   0.0889 0.1043 0.1265 0.0810 -0.1603  0.4460 -0.6063
(0.553,0.869]   0.1037 0.1217 0.0830 0.0931 -0.1603 -0.1150 -0.0454
(0.869,1.31]    0.0889 0.0957 0.1344 0.0810 -0.0733  0.5066 -0.5800
(1.31,3.16]     0.0667 0.1130 0.1581 0.0648 -0.5281  0.8923 -1.4204

$X5
                 ct1.y1 ct1.y0 ct0.y1 ct0.y0 ct1.woe ct0.woe    nwoe
[-3.13,-1.29]    0.1037 0.0826 0.1225 0.0891  0.2274  0.3189 -0.0915
(-1.29,-0.871]   0.0889 0.0739 0.1265 0.1093  0.1845  0.1459  0.0386
(-0.871,-0.534]  0.1185 0.1000 0.0830 0.0972  0.1699 -0.1575  0.3274
(-0.534,-0.273]  0.1259 0.0565 0.0830 0.1296  0.8011 -0.4452  1.2463
(-0.273,-0.0115] 0.1000 0.1087 0.1067 0.0850 -0.0834  0.2273 -0.3107
(-0.0115,0.178]  0.1111 0.1000 0.1146 0.0729  0.1054  0.4529 -0.3476
(0.178,0.459]    0.1037 0.0783 0.1028 0.1134  0.2815 -0.0981  0.3796
(0.459,0.838]    0.0852 0.1217 0.0909 0.1053 -0.3571 -0.1466 -0.2104
(0.838,1.27]     0.0963 0.1261 0.0909 0.0891 -0.2695  0.0205 -0.2900
(1.27,3.19]      0.0667 0.1522 0.0791 0.1093 -0.8253 -0.3241 -0.5012

$X6
                ct1.y1 ct1.y0 ct0.y1 ct0.y0 ct1.woe ct0.woe    nwoe
[-3.54,-1.27]   0.0963 0.1174 0.0672 0.1215 -0.1981 -0.5920  0.3939
(-1.27,-0.783]  0.0741 0.1304 0.0949 0.1053 -0.5658 -0.1040 -0.4618
(-0.783,-0.489] 0.0741 0.1217 0.1146 0.0931 -0.4968  0.2078 -0.7046
(-0.489,-0.258] 0.0852 0.1043 0.0830 0.1296 -0.2029 -0.4452  0.2423
(-0.258,0.0177] 0.1333 0.1130 0.0870 0.0648  0.1651  0.2945 -0.1294
(0.0177,0.252]  0.1222 0.1043 0.0988 0.0729  0.1581  0.3045 -0.1464
(0.252,0.515]   0.0926 0.0913 0.1146 0.1012  0.0140  0.1244 -0.1104
(0.515,0.852]   0.0778 0.0783 0.1304 0.1134 -0.0062  0.1403 -0.1465
(0.852,1.29]    0.1074 0.0783 0.0949 0.1174  0.3166 -0.2132  0.5298
(1.29,3.1]      0.1370 0.0609 0.1146 0.0810  0.8115  0.3476  0.4640

uplift documentation built on May 2, 2019, 9:32 a.m.