niv: Net Information Value.
In leoguelman/uplift2: Uplift Modeling

View source: R/niv.R

niv	R Documentation

Net Information Value.

Description

niv computes the net information value for each uplift predictor. This can be a helpful exploratory tool to (preliminary) determine the predictive power of each variable for uplift.

Usage

## S3 method for class 'formula'
niv(formula, data, subset, na.action = na.pass,
  nBins = 10, continuous = 4, B = 10, woeAdj = 0.5, parallel = TRUE,
  nCore = NULL, digitsB = NULL, classLevel = NULL, treatLevel = NULL)

## S3 method for class 'niv'
print(x, ...)

## S3 method for class 'niv'
summary(object, ...)

Arguments

`formula`	A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one. `niv` only handles response variables of class factor.
`data`	A data frame in which to interpret the variables named in the formula.
`subset`	Expression indicating which subset of the rows of data should be included. All observations are included by default.
`na.action`	A missing-data filter function. Defaults to `na.pass`.
`nBins`	The number of bins created from numeric predictors. The bins are created based on sample quantiles, with a default value of 10 (deciles).
`continuous`	Specifies the threshold for when bins should be created from numeric predictors. If there are less or equal than n (i.e., `continuous = n`) unique values in the numeric predictor, it is coverted to a factor without binning. The default is `continuous = 4`.
`B`	The number of bootstraps.
`woeAdj`	The adjustment factor used to avoid an undefined WOE. The value should be between [0, 1]. By default `woeAdj = 0.5`. See details.
`parallel`	If `TRUE`, computations are performed in parallel, otherwise they are done sequentially.
`nCore`	The number of cores used. Default is: number of available cores-1.
`digitsB`	Number of digits used in formatting the breaks in numeric predictors.
`classLevel`	A character string for the class of interest. Defaults to the last level of the factor.
`treatLevel`	A character string for the treatment level of interest. Defaults to the last level of the treatment factor.
`x`	A `niv` object.
`object`	A `niv` object.

Details

Given a binary response variable y \in (0,1), the information value (Siddiqi, 2006) from a predictor x is given by

IV = ∑_{i=1}^{G} ≤ft (P(x=i|y=1) - P(x=i|y=0) \right) \times WOE_i

where G is the number of groups created from a numeric predictor or levels from a categorical predictor, and WOE_i = ln (\frac{P(x=i|y=1)}{P(x=i|y=0)}).

To avoid an undefined WOE, an adjustment factor A is used. Specifically, WOE_i = ln(\frac{(N(x=i|y=1)+A)/(N(y=1))}{(N(x=i|y=0)+A)/(N(y=0))}), where N represents observation counts.

The net information value (NIV) proposed by Larsen (2009) is a natural extension of the IV for the case of uplift. It is computed as

NIV = ∑_{i=1}^{G}(P(x=i|y=1)^{T} \times P(x=i|y=0)^{C} - P(x=i|y=0)^{T} \times P(x=i|y=1)^{C}) \times NWOE_i

where NWOE_i = WOE_i^{T} - WOE_i^{C}, and T and C refer to treatment and control groups, respectively.

The adjusted net information value (ANIV) is computed as follows

Draw B bootstrap samples from the training data and compute the NIV for each variable in each sample.
Compute the mean of the NIV (NIV_{mean}) and sd of the NIV (NIV_{sd}) for each variable over the B replications.
The ANIV for a given variable is computed by subtracting a penalty term from the mean NIV. Specifically, ANIV = NIV_{mean} - \frac{NIV_{sd}}{√{B}}.

Value

An object of class niv, which is a list with the following components (among others passed to the S3 methods):

nwoeData A list of data frames, one for each variable. The columns represent:
- y00 the number of non-event records (response != classLevel) in the control group (treatment != treatLevel).
- y10 the number of event records (response == classLevel) in the control group (treatment != treatLevel).
- y01 the number of non-event records in the treatment group (treatment == treatLevel).
- y11 the number of event records in the treatment group.
- py00 proportion of non-event records in the control group.
- py10 proportion of event records in the control group.
- py01 proportion of non-event records in the treatment group.
- py11 proportion of event records in the treatment group.
- woe0 the control group weight-of-evidence.
- woe1 the treatment group weight-of-evidence.
- nwoe the net weight-of-evidence.
- niv the net information value.
The values above are computed based on the entire data.
nivData A data frame with the following columns: niv (the average net information value for each variable over all bootstrap samples), the penalty term, and the adjusted net information value.

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Larsen, K. (2009). Net lift models. In: M2009 - 12th Annual SAS Data Mining Conference.

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley, Hoboken, NJ.

Examples

set.seed(1)
df <- sim_uplift(n = 1000, p = 20, response = "binary")
f <- create_uplift_formula(names(df)[-c(1:3)], "y", "T")
netInf <- niv(f, data = df, B=10, parallel = FALSE)
head(netInf$nivData)

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.