Woe: Weight of evidence for each level of a factor.
In Causata: Analysis utilities for binary classification and Causata users.

Description Usage Arguments Details Value Author(s) See Also Examples

Computes the weight of evidence for each level of a factor and a dependent variable.

1 2	## S3 method for class 'factor' Woe(iv, dv, maxOdds=10000, civ=NULL, ...)

`iv`	A factor, the independent variable. Missing values, if present, are replaced using `CleanNaFromFactor`.
`dv`	The dependent variable, which may have only two unique values. Missing values are not allowed.
`maxOdds`	When the odds are greater than `maxOdds` or less than `1/maxOdds` then the odds are replaced with the threshold value.
`civ`	If `iv` is a discretized version of a continuous variable, then the original continuos variable can be provided in this argument so that linearity can be calculated. See the Value section below for more information.
`...`	Extra unused arguments.

This function computes the log odds (aka weight of evidence) for each level in a factor as follows:

woe = \log \frac{nPositive}{nNegative}

where nPositive is the number of "positive" values in the dependent variable, and nNegative is the number of "negative" values.

By default the second level of dv is used as the "positive" class during power calculations. This can be controlled by ordering the levels in a factor supplied as dv.

Other metrics returned include the information value and the log density ratio.

A list with the following elements:

`woe.levels`	A vector of WOE values corresponding to each level of the factor `iv`. The values are ordered to match the input factor `iv`.
`woe`	A vector of WOE values with the same length as `iv`. Essentially each factor value is replaced with the associated log odds.
`odds`	A vector of odds values corresponding to each level of the factor `iv`. The values are ordered to match the input factor `iv`.
`bin.count`	A count of data points in each level of the factor `iv`.
`true.count`	A count of "true" dependent variable values in each level of the factor `iv`. The number of "false" values is `bin.count - true.count`.
`log.density.ratio`	A vector of log density ratio values corresponding to each level of the factor `iv`. The values are ordered to match the input factor `iv`.
`information.value`	A vector of information values corresponding to each level of the factor `iv`. The values are ordered to match the input factor `iv`.
`linearity`	A measure of correlation between the log-odds of the dependent variable and the binned values of the continuous independent variable `civ`. This is calculated if the `civ` argument was provided, otherwise it's NA.

Justin Hemann <support@causata.com>

CleanNaFromFactor.

library(stringr)

# create a factor with three levels
# - odds of 1 for a:  1:2 = 2.0
# - odds of 1 for b:  2:1 = 0.5
# - odds of 1 for NA: 1:1 = 1.0
f1  <- factor(c(str_split("a a a b b b", " ")[[1]], NA,NA))
dv1 <- c(                  1,1,0,0,0,1,              1, 0 )
fw1 <- Woe(f1,dv1)
fw1$odds

# discretize a continuous variable into a factor with 10 levels and compute WOE,
data(df.causata)
dv <- df.causata$has.responded.mobile.logoff_next.hour_466
f2 <- BinaryCut(df.causata$online.average.authentications.per.month_all.past_406, dv)
fw2 <- Woe(f2, dv, civ=df.causata$online.average.authentications.per.month_all.past_406)
fw2$odds
fw2$linearity

[1] 2.0 0.5 1.0
[1] 0.03961689 0.07553551 0.06581934 0.04958184 0.05841924 0.04950177 0.04063701
[8] 0.02508361
[1] -0.7827277

Causata documentation built on May 2, 2019, 3:26 a.m.

Causata index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Causata
Analysis utilities for binary classification and Causata users.

Woe: Weight of evidence for each level of a factor.
In Causata: Analysis utilities for binary classification and Causata users.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Example output

Related to Woe in Causata...

R Package Documentation

Browse R Packages

We want your feedback!

Causata Analysis utilities for binary classification and Causata users.

Woe: Weight of evidence for each level of a factor. In Causata: Analysis utilities for binary classification and Causata users.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Example output

Related to Woe in Causata...

R Package Documentation

Browse R Packages

We want your feedback!

Causata
Analysis utilities for binary classification and Causata users.

Woe: Weight of evidence for each level of a factor.
In Causata: Analysis utilities for binary classification and Causata users.