Computes the weight of evidence for each level of a factor and a dependent variable.

1 2 |

`iv` |
A factor, the independent variable. Missing values, if present, are replaced using |

`dv` |
The dependent variable, which may have only two unique values. Missing values are not allowed. |

`maxOdds` |
When the odds are greater than |

`civ` |
If |

`...` |
Extra unused arguments. |

This function computes the log odds (aka weight of evidence) for each level in a factor as follows:

*woe = \log \frac{nPositive}{nNegative}*

where `nPositive`

is the number of "positive" values in the dependent variable, and `nNegative`

is the number of "negative" values.

By default the second level of `dv`

is used as the "positive" class during power calculations. This can
be controlled by ordering the levels in a factor supplied as `dv`

.

Other metrics returned include the information value and the log density ratio.

A list with the following elements:

`woe.levels ` |
A vector of WOE values corresponding to each level of the factor |

`woe ` |
A vector of WOE values with the same length as |

`odds ` |
A vector of odds values corresponding to each level of the factor |

`bin.count ` |
A count of data points in each level of the factor |

`true.count ` |
A count of "true" dependent variable values in each level of the factor |

`log.density.ratio ` |
A vector of log density ratio values corresponding to each level of the factor |

`information.value ` |
A vector of information values corresponding to each level of the factor |

`linearity ` |
A measure of correlation
between the log-odds of the dependent variable and the binned values of the continuous independent variable |

Justin Hemann <support@causata.com>

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
library(stringr)
# create a factor with three levels
# - odds of 1 for a: 1:2 = 2.0
# - odds of 1 for b: 2:1 = 0.5
# - odds of 1 for NA: 1:1 = 1.0
f1 <- factor(c(str_split("a a a b b b", " ")[[1]], NA,NA))
dv1 <- c( 1,1,0,0,0,1, 1, 0 )
fw1 <- Woe(f1,dv1)
fw1$odds
# discretize a continuous variable into a factor with 10 levels and compute WOE,
data(df.causata)
dv <- df.causata$has.responded.mobile.logoff_next.hour_466
f2 <- BinaryCut(df.causata$online.average.authentications.per.month_all.past_406, dv)
fw2 <- Woe(f2, dv, civ=df.causata$online.average.authentications.per.month_all.past_406)
fw2$odds
fw2$linearity
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.