Description Usage Arguments Details Value Author(s) See Also Examples
Computes the weight of evidence for each level of a factor and a dependent variable.
1 2 |
iv |
A factor, the independent variable. Missing values, if present, are replaced using |
dv |
The dependent variable, which may have only two unique values. Missing values are not allowed. |
maxOdds |
When the odds are greater than |
civ |
If |
... |
Extra unused arguments. |
This function computes the log odds (aka weight of evidence) for each level in a factor as follows:
woe = \log \frac{nPositive}{nNegative}
where nPositive
is the number of "positive" values in the dependent variable, and nNegative
is the number of "negative" values.
By default the second level of dv
is used as the "positive" class during power calculations. This can
be controlled by ordering the levels in a factor supplied as dv
.
Other metrics returned include the information value and the log density ratio.
A list with the following elements:
woe.levels |
A vector of WOE values corresponding to each level of the factor |
woe |
A vector of WOE values with the same length as |
odds |
A vector of odds values corresponding to each level of the factor |
bin.count |
A count of data points in each level of the factor |
true.count |
A count of "true" dependent variable values in each level of the factor |
log.density.ratio |
A vector of log density ratio values corresponding to each level of the factor |
information.value |
A vector of information values corresponding to each level of the factor |
linearity |
A measure of correlation
between the log-odds of the dependent variable and the binned values of the continuous independent variable |
Justin Hemann <support@causata.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library(stringr)
# create a factor with three levels
# - odds of 1 for a: 1:2 = 2.0
# - odds of 1 for b: 2:1 = 0.5
# - odds of 1 for NA: 1:1 = 1.0
f1 <- factor(c(str_split("a a a b b b", " ")[[1]], NA,NA))
dv1 <- c( 1,1,0,0,0,1, 1, 0 )
fw1 <- Woe(f1,dv1)
fw1$odds
# discretize a continuous variable into a factor with 10 levels and compute WOE,
data(df.causata)
dv <- df.causata$has.responded.mobile.logoff_next.hour_466
f2 <- BinaryCut(df.causata$online.average.authentications.per.month_all.past_406, dv)
fw2 <- Woe(f2, dv, civ=df.causata$online.average.authentications.per.month_all.past_406)
fw2$odds
fw2$linearity
|
[1] 2.0 0.5 1.0
[1] 0.03961689 0.07553551 0.06581934 0.04958184 0.05841924 0.04950177 0.04063701
[8] 0.02508361
[1] -0.7827277
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.