Description Usage Arguments Details Value Author(s) References See Also Examples
This function computes predictive power for a single independent variable and a binary dependent variable.
1 2 3 4 5 6 7 | ## S3 method for class 'factor'
PredictivePower(iv, dv, warn.levels=30, cv=NULL, debug=FALSE, ...)
## S3 method for class 'numeric'
PredictivePower(iv, dv, warn.levels=30, cv=NULL, debug=FALSE, ...)
PredictivePowerCv(iv, dv, warn.levels=30, debug=FALSE, folds=10, ...)
|
iv |
The independent variable. |
dv |
The dependent variable, which may have only two unique values. |
warn.levels |
If the number of levels in |
debug |
If set to |
cv |
If |
... |
Additional arguments are passed to |
folds |
This argument is used to specify the folds used for cross validation.
If a number between 2 and 10 is provided then data will be assigned to the selected number of folds at
random. If a vector of values is provided then it will be used as an index to assign data to folds.
The number of unique values must be between 2 to 10, and the vector length must match |
Predictive power is defined as the area under the gains chart for the provided independent variable divided by the area under the gains chart for a perfect predictor. A random predictor would have a predictive power value of 0, and a perfect predictor would have a value of 1.
The power calculation is derived from a discretized gains chart. As such it only works with categorical variables.
Numeric variables are discretized before power is computed.
The PredictivePower.numeric
function discretizes continuous data using the BinaryCut
function.
Note that the predictive power will depend, in part, on the discretization method.
By default the second level of dv
is used as the "positive" class during power calculations. This can
be controlled by ordering the levels in a factor supplied as dv
.
Missing values in iv
are allowed in PredictivePower.factor
–
they are ignored during the calculations, as are the corresponding
dependent variable values. The missing values can be used in the power calculations if the missing values
are mapped to a non-missing level in the factor. See CleanNaFromFactor
.
Missing values are not allowed in dv
.
Cross validation is executed using the PredictivePowerCv
function as a wrapper for the
PredictivePower
functions. When constructing the gains chart the bins are ordered by the odds for
a "positve" within each bin. During cross validation the ordering is derived from one set of data, and
the area under the curve is calculated with the other set.
The PredictivePower
functions returns a numeric value representing the predictive power, between 0 and 1.
PredictivePowerCv
returns a list as follows:
predictive.power |
An array of predictive power values, one for each fold of cross validation. |
mean |
The mean predictive power value. |
sd |
The standard deviation of predictive power values. |
robustness |
A measure of stability defined as |
Justin Hemann <support@causata.com>
Inspired by Miller, H. (2009) Predicting customer behaviour: The University of Melbourne's KDD Cup report.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | library(stringr)
# Power is 1/3 where levels differ by 1/3, missing values in iv are ignored.
PredictivePower(factor(c(str_split("a a a b b b", " ")[[1]], NA,NA)),
c( 1,1,0,0,0,1, 1, 1 ) )
# Power is 1.0 for perfect predictor
PredictivePower(factor(c(str_split("a a a a a b b b b b", " "))[[1]]),
factor(c(str_split("1 1 1 1 1 0 0 0 0 0", " "))[[1]]) )
# Power is 0 for random predictor
PredictivePower(factor(c(str_split("a a a a b b b b", " "))[[1]]),
factor(c(str_split("1 1 0 0 1 1 0 0", " "))[[1]]) )
# compute power for random data, power and robustness should be low
set.seed(1234)
fl <- as.factor(sample(letters, size=1e5, replace=TRUE))
dv <- sample(c(0,1), size=1e5, replace=TRUE)
PredictivePowerCv(fl,dv)
# compute power for numeric data, send nbins arguments to BinaryCut
ivn <- rnorm(1e5)
dvn <- rep(0, 1e5)
dvn[(ivn + rnorm(1e5, sd=0.5))>0] <- 1
PredictivePower(ivn,dvn, nbins=10)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.