Description Usage Arguments Details Value Examples
Statistics for y
are plotted with respect to each level or bin of x
. Plotted
statistics can be proportions, log-odds, or weight-of-evidence values. Bins can be created
using raw factor levels, quantile breakpoints, uniform breakpoints, or recursive partitioning.
Additional arguments may be passed to rpart.control()
to fine-tune recursive partitioning.
Plots showing the ymetric
for each value of xsplit
as well as the total volume in
each bin are printed to the current graphics device. In addition, two measures of the overall
strength of the predictive relationship (Information Value & ChiSq) are calculated and returned.
1 2 |
y |
(numeric) binary response vector |
x |
(numeric) numeric or factor predictor vector |
ymetric |
(character) statistic to calculate for |
xsplit |
(character) method used to bin |
nbins |
(numeric) number of bins to create from |
nabin |
(logical) whether to include an additional bin for missing |
yticks |
(numeric) number of tick marks to display on the y-axis of plots |
... |
(args) additional arguments to pass to |
If xsplit='rpart'
bins will be created based on recursive partitioning for both
numeric and factor variables and the nbins
argument will be ignored. Pass additional
control parameters (e.g. cp, minbucket) in the function call to control partitioning
behavior. If zero or greater than 20 bins are created using the rpart control settings
passed the function will throw an error. If x is a factor variable the x-axis labels on
the returned plots will correspond to the index positions of the levels of x (and not
the factor labels themselves) in each bin. It's generally not a good idea to use recursive
partitioning with more than 50 factor levels. If x is a numeric variable the x-axis labels
will be the range cutpoints for each bin created via recursive partitioning.
If xsplit=c('uniform','quantile')
and x is a factor variable its levels are used
directly as bins and the nbins
argument will be ignored. If x is a numeric variable
bins are calculated by dividing the range of x into buckets of either equal size (uniform)
or equal count (quantile). If quantile breakpoints are not unique then adjacent identical
bins will be combined.
If bins get created which have either zero volume or zero variance then log-odds and woe cannot be calculated. Any such bins will be excluded from both the displayed plots and also the calculation of information value for the variable. This problem can typically be solved by using quantile binning and/or reducing the number of bins created.
a list containing the following elements:
iv - Information Value
chi2 - ChiSq Statistic
yPlot - ggplot object of ymetric
vs. bins
vPlot - ggplot object of bin sizes or volume
1 2 3 4 5 6 7 8 9 10 | data(diamonds, package = 'ggplot2')
y <- as.numeric(diamonds$price > mean(diamonds$price))
x1 <- diamonds$carat
x2 <- diamonds$clarity
x3 <- diamonds$y
res <- plotYXbin(y, x1)
res <- plotYXbin(y, x1, nbins = 8, nabin = FALSE)
res <- plotYXbin(y, x2, ymetric = 'woe')
res <- plotYXbin(y, x3, ymetric = 'proportion', xsplit = 'rpart', cp = 1e-4, minbucket = 100)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.