niv | R Documentation |
niv
computes the net information value for each uplift predictor. This
can be a helpful exploratory tool to (preliminary) determine the predictive
power of each variable for uplift.
## S3 method for class 'formula' niv(formula, data, subset, na.action = na.pass, nBins = 10, continuous = 4, B = 10, woeAdj = 0.5, parallel = TRUE, nCore = NULL, digitsB = NULL, classLevel = NULL, treatLevel = NULL) ## S3 method for class 'niv' print(x, ...) ## S3 method for class 'niv' summary(object, ...)
formula |
A model formula of the form y ~ x1 + ....+ xn + trt(), where
the left-hand side corresponds to the observed response, the right-hand side
corresponds to the predictors, and 'trt' is the special expression to mark
the treatment term. If the treatment term is not a factor, it is converted to one.
|
data |
A data frame in which to interpret the variables named in the formula. |
subset |
Expression indicating which subset of the rows of data should be included. All observations are included by default. |
na.action |
A missing-data filter function. Defaults to |
nBins |
The number of bins created from numeric predictors. The bins are created based on sample quantiles, with a default value of 10 (deciles). |
continuous |
Specifies the threshold for when bins should be created from
numeric predictors. If there are less or equal than n (i.e.,
|
B |
The number of bootstraps. |
woeAdj |
The adjustment factor used to avoid an undefined WOE. The value
should be between [0, 1]. By default |
parallel |
If |
nCore |
The number of cores used. Default is: number of available cores-1. |
digitsB |
Number of digits used in formatting the breaks in numeric predictors. |
classLevel |
A character string for the class of interest. Defaults to the last level of the factor. |
treatLevel |
A character string for the treatment level of interest. Defaults to the last level of the treatment factor. |
x |
A |
object |
A |
Given a binary response variable y \in (0,1), the information value (Siddiqi, 2006) from a predictor x is given by
IV = ∑_{i=1}^{G} ≤ft (P(x=i|y=1) - P(x=i|y=0) \right) \times WOE_i
where G is the number of groups created from a numeric predictor or levels from a categorical predictor, and WOE_i = ln (\frac{P(x=i|y=1)}{P(x=i|y=0)}).
To avoid an undefined WOE, an adjustment factor A is used. Specifically, WOE_i = ln(\frac{(N(x=i|y=1)+A)/(N(y=1))}{(N(x=i|y=0)+A)/(N(y=0))}), where N represents observation counts.
The net information value (NIV) proposed by Larsen (2009) is a natural extension of the IV for the case of uplift. It is computed as
NIV = ∑_{i=1}^{G}(P(x=i|y=1)^{T} \times P(x=i|y=0)^{C} - P(x=i|y=0)^{T} \times P(x=i|y=1)^{C}) \times NWOE_i
where NWOE_i = WOE_i^{T} - WOE_i^{C}, and T and C refer to treatment and control groups, respectively.
The adjusted net information value (ANIV) is computed as follows
Draw B bootstrap samples from the training data and compute the NIV for each variable in each sample.
Compute the mean of the NIV (NIV_{mean}) and sd of the NIV (NIV_{sd}) for each variable over the B replications.
The ANIV for a given variable is computed by subtracting a penalty term from the mean NIV. Specifically, ANIV = NIV_{mean} - \frac{NIV_{sd}}{√{B}}.
An object of class niv
, which is a list with the following
components (among others passed to the S3 methods):
nwoeData
A list of data frames, one for each variable. The columns
represent:
y00
the number of non-event records
(response != classLevel
) in the control group (treatment !=
treatLevel
).
y10
the number of event records (response
== classLevel
) in the control group (treatment != treatLevel
).
y01
the number of non-event records in the treatment group
(treatment == treatLevel
).
y11
the number of event
records in the treatment group.
py00
proportion of non-event
records in the control group.
py10
proportion of event records
in the control group.
py01
proportion of non-event records in
the treatment group.
py11
proportion of event records in the
treatment group.
woe0
the control group weight-of-evidence.
woe1
the treatment group weight-of-evidence.
nwoe
the net weight-of-evidence.
niv
the net information value.
The values above are computed based on the entire data.
nivData
A data frame with the following columns: niv (the average net information
value for each variable over all bootstrap samples), the penalty term, and
the adjusted net information value.
Leo Guelman leo.guelman@gmail.com
Larsen, K. (2009). Net lift models. In: M2009 - 12th Annual SAS Data Mining Conference.
Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley, Hoboken, NJ.
ggplot.niv
.
set.seed(1) df <- sim_uplift(n = 1000, p = 20, response = "binary") f <- create_uplift_formula(names(df)[-c(1:3)], "y", "T") netInf <- niv(f, data = df, B=10, parallel = FALSE) head(netInf$nivData)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.