Description Usage Arguments Details Value Author(s) See Also Examples
An independent variable is evaluated as a predictor for a binary dependent variable. The independent variable may be numeric, a factor, or a data frame containing numeric and factor columns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## S3 method for class 'factor'
BinaryPredictor(iv, dv, min.power=0.01, min.robustness=0.5,
max.missing=0.99, max.levels=20, civ=NULL, copy.data=FALSE, name=NULL, ...)
## S3 method for class 'numeric'
BinaryPredictor(iv, dv, min.power=0.01, min.robustness=0.5,
max.missing=0.99, copy.data=FALSE, name=NULL, ...)
## S3 method for class 'data.frame'
BinaryPredictor(iv, dv, min.power=0.01, min.robustness=0.5,
max.missing=0.99, verbose=FALSE, copy.data=FALSE, ...)
## Default S3 method:
BinaryPredictor(iv, dv, ...)
## S3 method for class 'BinaryPredictor'
plot(x, y=NULL, type="bin", plot.missing=TRUE, ...)
## S3 method for class 'BinaryPredictorList'
print(x, file=NULL, silent=FALSE, ...)
|
iv |
The independent variable(s). May be a factor, numeric, or a data frame. |
dv |
The dependent variable, which may have only two unique values.
The length / number of rows in |
min.power |
The minimum predictive power from |
min.robustness |
The minimum robustness from |
max.missing |
The maxmimum allowable fraction of missing values for a variable to be kept. |
max.levels |
For factors, this controls the merging of small bins using |
civ |
When a continuous variable is discretized, the original continuous data can
be provided in |
copy.data |
Reserved for future use, indicates if the data should be copied. |
name |
The variable name. If NULL it will be extracted from the deparsed input |
... |
For the |
verbose |
If true then calculation information is printed. |
x |
Output from one of the |
y |
Unused argument for the generic |
plot.missing |
When plotting numeric variables a |
type |
Reserved for future use, indicates the type of plot to be generated. The only valid value now is 'bin'. |
file |
If a filename is provided then summary information will be written to a text file. |
silent |
If set to |
The BinaryPredictor
family of functions are used to evaluate predictors of a binary outcome.
Checks are executed for the variable class (only numeric, integer, and factor are allowed),
missing values, predictive power, and robustness.
If any checks fail then a "keep" flag is set to FALSE, otherwise it's TRUE.
The plot
function generates a summary plot of the predictor. Predictive power and robustness are
printed in the plot title, along with the smallest and largest bin sizes used during discretization.
For numeric variables a count of missing values is also printed.
The print
function writes a table of variable summary information to the screen or to a file.
If iv
is a vector then an object of class BinaryPredictor
is returned with the folowing items:
name |
The variable name. |
keep |
A boolean indicating if the variable meets the criteria for missing values, predictive power, etc. |
reason |
If |
missing |
The fraction of values that are missing / NA. |
class |
The variable class. |
predictivePower |
Results from |
woe |
Results from |
If iv
is a data frame then a list of BinaryPredictor objects is returned with class
BinaryPredictorList.
The print.BinaryPredictorList
function returns a data frame with columns for the values in
the BinaryPredictor output. The values include the variable name, predictive power, robustness, etc.
Justin Hemann <support@causata.com>
PredictivePowerCv, BinaryCut, MergeLevels, Woe,
ShortenStrings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | library(ggplot2)
data(diamonds)
# set a dependent variable that is TRUE when the price is above $5000
dv <- diamonds$price > 5000
# convert ordered to factor
diamonds$cut <- as.factor(as.character(diamonds$cut))
diamonds$color <- as.factor(as.character(diamonds$color))
diamonds$clarity <- as.factor(as.character(diamonds$clarity))
# evaluate diamond cut and carats, and generate a plot for each
bp.cut <- BinaryPredictor(diamonds$cut, dv)
plot(bp.cut)
bp.carat <- BinaryPredictor(diamonds$carat, dv)
plot(bp.carat)
# Evaluate all predictors, print summary to screen
# note that price does not have 100% predictive
# power since the discreatization boundary is not $5000.
# Using a sample of 10k records and 3 folds of cross validation
# for greater speed.
set.seed(98765)
idx <- sample.int(nrow(diamonds), 10000)
bpList <- BinaryPredictor(diamonds[idx, ], dv[idx], folds=3)
df.summary <- print(bpList)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.