Description Usage Arguments Details Value Author(s) See Also Examples
A numeric independent variable is discretized and returned as a factor. A binary dependent variable is used to select the bins using a simple, fast algorithm based on quantiles.
1 2 3 |
iv |
A numeric independent variable that will be cut into bins. Missing values will be ignored during
binning and replaced using |
dv |
The dependent variable must be an array of values with the same length as |
nbins |
The number of bins to break |
minBin |
Each bin will have at least |
woeDelta |
If the absolute value of the Weight Of Evidence for adjacent
bins falls below this threshold, then the bins are merged.
See |
bins |
If TRUE the breaks are returned, along with the factor, in a list. |
debug |
If TRUE debug information will be printed to the screen. |
This function is similar to cut, but it uses a dependent variable to inform the binning. The algorithm is designed to be fast and simple; it is a slightly modified version of an equal frequency approach (quantiles).
The algorithm works as follows:
The independent variable is filtered to include only non-missing values, and values from the smaller class of the dependent varaible.
The filtered independent variable is used to compute nbins
quantiles.
For the special case where there are fewer unique values than bins
the unique values are used as the quantiles.
The first and last quantiles are adjusted, if necessary, to include all independent variable values regardless of their dependent variable class.
The independent variable is cut into bins using the quantiles as boundaries.
Each class of the dependent variable is counted in each bin.
If the count is below minBin
for either class then the bin is merged with the smallest adjacent bin.
This merge process continues until all bins have a sufficient count of dependent variable values,
or until there are 2 bins left.
The Weight of Evidence is calculated for each bin. If the difference in the WOE for adjacent
bins falls below a threshold defined in terms of woeDelta
then the bins are merged.
If bins
is FALSE then a factor with up to nbins
levels is returned,
where the level names are as found from cut. Missing values in the independent
variable are returned as missing values in the output, and are not counted as a bin.
If bins
is TRUE then a list is returned with two elements:
fiv
A factor representation of the independent variable, as described above.
breaks
A vector of breaks or cutpoints used to discretize the independent variable.
Justin Hemann <support@causata.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | data(df.causata)
dv <- df.causata$has.responded.mobile.logoff_next.hour_466
iv <- df.causata$online.number.of.page.views_last.30.days_3
f <- BinaryCut(iv,dv)
# compute the weight of evidence for each bin
woe <- Woe(f, dv)
# adjust plot margins to increase space for bin labels
par(oma=c(1,8,1,1))
# plot the bins against the weight of evidence
barplot(woe$woe.levels, names.arg=levels(f), horiz=TRUE, las=1,
main="Weight of Evidence for clicking a banner for a mobile app.",
sub="WOE vs. Page View Count, Last 30 Days" )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.