niv: Net Information Value.

View source: R/niv.R

nivR Documentation

Net Information Value.

Description

niv computes the net information value for each uplift predictor. This can be a helpful exploratory tool to (preliminary) determine the predictive power of each variable for uplift.

Usage

## S3 method for class 'formula'
niv(formula, data, subset, na.action = na.pass,
  nBins = 10, continuous = 4, B = 10, woeAdj = 0.5, parallel = TRUE,
  nCore = NULL, digitsB = NULL, classLevel = NULL, treatLevel = NULL)

## S3 method for class 'niv'
print(x, ...)

## S3 method for class 'niv'
summary(object, ...)

Arguments

formula

A model formula of the form y ~ x1 + ....+ xn + trt(), where the left-hand side corresponds to the observed response, the right-hand side corresponds to the predictors, and 'trt' is the special expression to mark the treatment term. If the treatment term is not a factor, it is converted to one. niv only handles response variables of class factor.

data

A data frame in which to interpret the variables named in the formula.

subset

Expression indicating which subset of the rows of data should be included. All observations are included by default.

na.action

A missing-data filter function. Defaults to na.pass.

nBins

The number of bins created from numeric predictors. The bins are created based on sample quantiles, with a default value of 10 (deciles).

continuous

Specifies the threshold for when bins should be created from numeric predictors. If there are less or equal than n (i.e., continuous = n) unique values in the numeric predictor, it is coverted to a factor without binning. The default is continuous = 4.

B

The number of bootstraps.

woeAdj

The adjustment factor used to avoid an undefined WOE. The value should be between [0, 1]. By default woeAdj = 0.5. See details.

parallel

If TRUE, computations are performed in parallel, otherwise they are done sequentially.

nCore

The number of cores used. Default is: number of available cores-1.

digitsB

Number of digits used in formatting the breaks in numeric predictors.

classLevel

A character string for the class of interest. Defaults to the last level of the factor.

treatLevel

A character string for the treatment level of interest. Defaults to the last level of the treatment factor.

x

A niv object.

object

A niv object.

Details

Given a binary response variable y \in (0,1), the information value (Siddiqi, 2006) from a predictor x is given by

IV = ∑_{i=1}^{G} ≤ft (P(x=i|y=1) - P(x=i|y=0) \right) \times WOE_i

where G is the number of groups created from a numeric predictor or levels from a categorical predictor, and WOE_i = ln (\frac{P(x=i|y=1)}{P(x=i|y=0)}).

To avoid an undefined WOE, an adjustment factor A is used. Specifically, WOE_i = ln(\frac{(N(x=i|y=1)+A)/(N(y=1))}{(N(x=i|y=0)+A)/(N(y=0))}), where N represents observation counts.

The net information value (NIV) proposed by Larsen (2009) is a natural extension of the IV for the case of uplift. It is computed as

NIV = ∑_{i=1}^{G}(P(x=i|y=1)^{T} \times P(x=i|y=0)^{C} - P(x=i|y=0)^{T} \times P(x=i|y=1)^{C}) \times NWOE_i

where NWOE_i = WOE_i^{T} - WOE_i^{C}, and T and C refer to treatment and control groups, respectively.

The adjusted net information value (ANIV) is computed as follows

  1. Draw B bootstrap samples from the training data and compute the NIV for each variable in each sample.

  2. Compute the mean of the NIV (NIV_{mean}) and sd of the NIV (NIV_{sd}) for each variable over the B replications.

  3. The ANIV for a given variable is computed by subtracting a penalty term from the mean NIV. Specifically, ANIV = NIV_{mean} - \frac{NIV_{sd}}{√{B}}.

Value

An object of class niv, which is a list with the following components (among others passed to the S3 methods):

  • nwoeData A list of data frames, one for each variable. The columns represent:

    • y00 the number of non-event records (response != classLevel) in the control group (treatment != treatLevel).

    • y10 the number of event records (response == classLevel) in the control group (treatment != treatLevel).

    • y01 the number of non-event records in the treatment group (treatment == treatLevel).

    • y11 the number of event records in the treatment group.

    • py00 proportion of non-event records in the control group.

    • py10 proportion of event records in the control group.

    • py01 proportion of non-event records in the treatment group.

    • py11 proportion of event records in the treatment group.

    • woe0 the control group weight-of-evidence.

    • woe1 the treatment group weight-of-evidence.

    • nwoe the net weight-of-evidence.

    • niv the net information value.

    The values above are computed based on the entire data.

  • nivData A data frame with the following columns: niv (the average net information value for each variable over all bootstrap samples), the penalty term, and the adjusted net information value.

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Larsen, K. (2009). Net lift models. In: M2009 - 12th Annual SAS Data Mining Conference.

Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley, Hoboken, NJ.

See Also

ggplot.niv.

Examples

set.seed(1)
df <- sim_uplift(n = 1000, p = 20, response = "binary")
f <- create_uplift_formula(names(df)[-c(1:3)], "y", "T")
netInf <- niv(f, data = df, B=10, parallel = FALSE)
head(netInf$nivData)

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.