This function computes predictive power for a single independent variable and a binary dependent variable.

1 2 3 4 5 6 7 | ```
## S3 method for class 'factor'
PredictivePower(iv, dv, warn.levels=30, cv=NULL, debug=FALSE, ...)
## S3 method for class 'numeric'
PredictivePower(iv, dv, warn.levels=30, cv=NULL, debug=FALSE, ...)
PredictivePowerCv(iv, dv, warn.levels=30, debug=FALSE, folds=10, ...)
``` |

`iv` |
The independent variable. |

`dv` |
The dependent variable, which may have only two unique values. |

`warn.levels` |
If the number of levels in |

`debug` |
If set to |

`cv` |
If |

`...` |
Additional arguments are passed to |

`folds` |
This argument is used to specify the folds used for cross validation.
If a number between 2 and 10 is provided then data will be assigned to the selected number of folds at
random. If a vector of values is provided then it will be used as an index to assign data to folds.
The number of unique values must be between 2 to 10, and the vector length must match |

Predictive power is defined as the area under the gains chart for the provided independent variable divided by the area under the gains chart for a perfect predictor. A random predictor would have a predictive power value of 0, and a perfect predictor would have a value of 1.

The power calculation is derived from a discretized gains chart. As such it only works with categorical variables.
Numeric variables are discretized before power is computed.
The `PredictivePower.numeric`

function discretizes continuous data using the `BinaryCut`

function.
Note that the predictive power will depend, in part, on the discretization method.

By default the second level of `dv`

is used as the "positive" class during power calculations. This can
be controlled by ordering the levels in a factor supplied as `dv`

.

Missing values in `iv`

are allowed in `PredictivePower.factor`

–
they are ignored during the calculations, as are the corresponding
dependent variable values. The missing values can be used in the power calculations if the missing values
are mapped to a non-missing level in the factor. See `CleanNaFromFactor`

.
Missing values are not allowed in `dv`

.

Cross validation is executed using the `PredictivePowerCv`

function as a wrapper for the
`PredictivePower`

functions. When constructing the gains chart the bins are ordered by the odds for
a "positve" within each bin. During cross validation the ordering is derived from one set of data, and
the area under the curve is calculated with the other set.

The `PredictivePower`

functions returns a numeric value representing the predictive power, between 0 and 1.

`PredictivePowerCv`

returns a list as follows:

`predictive.power` |
An array of predictive power values, one for each fold of cross validation. |

`mean` |
The mean predictive power value. |

`sd` |
The standard deviation of predictive power values. |

`robustness` |
A measure of stability defined as |

Justin Hemann <support@causata.com>

Inspired by Miller, H. (2009) *Predicting customer behaviour: The University of Melbourne's KDD Cup report*.

`CleanNaFromFactor`

, `BinaryCut`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ```
library(stringr)
# Power is 1/3 where levels differ by 1/3, missing values in iv are ignored.
PredictivePower(factor(c(str_split("a a a b b b", " ")[[1]], NA,NA)),
c( 1,1,0,0,0,1, 1, 1 ) )
# Power is 1.0 for perfect predictor
PredictivePower(factor(c(str_split("a a a a a b b b b b", " "))[[1]]),
factor(c(str_split("1 1 1 1 1 0 0 0 0 0", " "))[[1]]) )
# Power is 0 for random predictor
PredictivePower(factor(c(str_split("a a a a b b b b", " "))[[1]]),
factor(c(str_split("1 1 0 0 1 1 0 0", " "))[[1]]) )
# compute power for random data, power and robustness should be low
set.seed(1234)
fl <- as.factor(sample(letters, size=1e5, replace=TRUE))
dv <- sample(c(0,1), size=1e5, replace=TRUE)
PredictivePowerCv(fl,dv)
# compute power for numeric data, send nbins arguments to BinaryCut
ivn <- rnorm(1e5)
dvn <- rep(0, 1e5)
dvn[(ivn + rnorm(1e5, sd=0.5))>0] <- 1
PredictivePower(ivn,dvn, nbins=10)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.