# Woe: Weight of evidence for each level of a factor. In Causata: Analysis utilities for binary classification and Causata users.

## Description

Computes the weight of evidence for each level of a factor and a dependent variable.

## Usage

 1 2 ## S3 method for class 'factor' Woe(iv, dv, maxOdds=10000, civ=NULL, ...) 

## Arguments

 iv A factor, the independent variable. Missing values, if present, are replaced using CleanNaFromFactor. dv The dependent variable, which may have only two unique values. Missing values are not allowed. maxOdds When the odds are greater than maxOdds or less than 1/maxOdds then the odds are replaced with the threshold value. civ If iv is a discretized version of a continuous variable, then the original continuos variable can be provided in this argument so that linearity can be calculated. See the Value section below for more information. ... Extra unused arguments.

## Details

This function computes the log odds (aka weight of evidence) for each level in a factor as follows:

woe = \log \frac{nPositive}{nNegative}

where nPositive is the number of "positive" values in the dependent variable, and nNegative is the number of "negative" values.

By default the second level of dv is used as the "positive" class during power calculations. This can be controlled by ordering the levels in a factor supplied as dv.

Other metrics returned include the information value and the log density ratio.

## Value

A list with the following elements:

 woe.levels  A vector of WOE values corresponding to each level of the factor iv. The values are ordered to match the input factor iv. woe  A vector of WOE values with the same length as iv. Essentially each factor value is replaced with the associated log odds. odds  A vector of odds values corresponding to each level of the factor iv. The values are ordered to match the input factor iv. bin.count  A count of data points in each level of the factor iv. true.count  A count of "true" dependent variable values in each level of the factor iv. The number of "false" values is bin.count - true.count. log.density.ratio  A vector of log density ratio values corresponding to each level of the factor iv. The values are ordered to match the input factor iv. information.value  A vector of information values corresponding to each level of the factor iv. The values are ordered to match the input factor iv. linearity  A measure of correlation between the log-odds of the dependent variable and the binned values of the continuous independent variable civ. This is calculated if the civ argument was provided, otherwise it's NA.

## Author(s)

Justin Hemann <support@causata.com>

CleanNaFromFactor.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 library(stringr) # create a factor with three levels # - odds of 1 for a: 1:2 = 2.0 # - odds of 1 for b: 2:1 = 0.5 # - odds of 1 for NA: 1:1 = 1.0 f1 <- factor(c(str_split("a a a b b b", " ")[[1]], NA,NA)) dv1 <- c( 1,1,0,0,0,1, 1, 0 ) fw1 <- Woe(f1,dv1) fw1$odds # discretize a continuous variable into a factor with 10 levels and compute WOE, data(df.causata) dv <- df.causata$has.responded.mobile.logoff_next.hour_466 f2 <- BinaryCut(df.causata$online.average.authentications.per.month_all.past_406, dv) fw2 <- Woe(f2, dv, civ=df.causata$online.average.authentications.per.month_all.past_406) fw2$odds fw2$linearity