PredictivePower: Predictive power for a single variable.
In Causata: Analysis utilities for binary classification and Causata users.

Description Usage Arguments Details Value Author(s) References See Also Examples

This function computes predictive power for a single independent variable and a binary dependent variable.

## S3 method for class 'factor'
PredictivePower(iv, dv, warn.levels=30, cv=NULL, debug=FALSE, ...)

## S3 method for class 'numeric'
PredictivePower(iv, dv, warn.levels=30, cv=NULL, debug=FALSE, ...)

PredictivePowerCv(iv, dv, warn.levels=30, debug=FALSE, folds=10, ...)

`iv`	The independent variable.
`dv`	The dependent variable, which may have only two unique values.
`warn.levels`	If the number of levels in `iv` exceeds this value then a warning will be issued.
`debug`	If set to `TRUE` then debugging information is printed to the screen.
`cv`	If `NULL` then all data are used to compute the predictive power. If an index of boolean values is provided then they are used to separate the data into two parts for cross validation. See the Details below for more information.
`...`	Additional arguments are passed to `BinaryCut`.
`folds`	This argument is used to specify the folds used for cross validation. If a number between 2 and 10 is provided then data will be assigned to the selected number of folds at random. If a vector of values is provided then it will be used as an index to assign data to folds. The number of unique values must be between 2 to 10, and the vector length must match `iv`.

Predictive power is defined as the area under the gains chart for the provided independent variable divided by the area under the gains chart for a perfect predictor. A random predictor would have a predictive power value of 0, and a perfect predictor would have a value of 1.

The power calculation is derived from a discretized gains chart. As such it only works with categorical variables. Numeric variables are discretized before power is computed. The PredictivePower.numeric function discretizes continuous data using the BinaryCut function. Note that the predictive power will depend, in part, on the discretization method.

By default the second level of dv is used as the "positive" class during power calculations. This can be controlled by ordering the levels in a factor supplied as dv.

Missing values in iv are allowed in PredictivePower.factor – they are ignored during the calculations, as are the corresponding dependent variable values. The missing values can be used in the power calculations if the missing values are mapped to a non-missing level in the factor. See CleanNaFromFactor. Missing values are not allowed in dv.

Cross validation is executed using the PredictivePowerCv function as a wrapper for the PredictivePower functions. When constructing the gains chart the bins are ordered by the odds for a "positve" within each bin. During cross validation the ordering is derived from one set of data, and the area under the curve is calculated with the other set.

The PredictivePower functions returns a numeric value representing the predictive power, between 0 and 1.

PredictivePowerCv returns a list as follows:

`predictive.power`	An array of predictive power values, one for each fold of cross validation.
`mean`	The mean predictive power value.
`sd`	The standard deviation of predictive power values.
`robustness`	A measure of stability defined as `1-sd/mean`. Values will be between zero (unstable) and 1 (stable).

Justin Hemann <support@causata.com>

Inspired by Miller, H. (2009) Predicting customer behaviour: The University of Melbourne's KDD Cup report.

CleanNaFromFactor, BinaryCut.

library(stringr)

# Power is 1/3 where levels differ by 1/3, missing values in iv are ignored.
PredictivePower(factor(c(str_split("a a a b b b", " ")[[1]], NA,NA)),
              c(                    1,1,0,0,0,1,              1, 1 ) )

# Power is 1.0 for perfect predictor
PredictivePower(factor(c(str_split("a a a a a b b b b b", " "))[[1]]),
                factor(c(str_split("1 1 1 1 1 0 0 0 0 0", " "))[[1]]) )

# Power is 0 for random predictor
PredictivePower(factor(c(str_split("a a a a b b b b", " "))[[1]]),
                factor(c(str_split("1 1 0 0 1 1 0 0", " "))[[1]]) )

# compute power for random data, power and robustness should be low
set.seed(1234)
fl <- as.factor(sample(letters, size=1e5, replace=TRUE))
dv <- sample(c(0,1), size=1e5, replace=TRUE)
PredictivePowerCv(fl,dv)

# compute power for numeric data, send nbins arguments to BinaryCut
ivn <- rnorm(1e5)
dvn <- rep(0, 1e5)
dvn[(ivn + rnorm(1e5, sd=0.5))>0] <- 1
PredictivePower(ivn,dvn, nbins=10)

[1] 0.3333333
[1] 1
[1] 0
$predictive.power
 [1] 0.010157211 0.000000000 0.000000000 0.008199210 0.000000000 0.000000000
 [7] 0.000000000 0.008762443 0.016981207 0.019199225

$mean
[1] 0.006329929

$sd
[1] 0.007479364

$robustness
[1] 0

[1] 0.8087834

Causata documentation built on May 2, 2019, 3:26 a.m.

Causata index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Causata
Analysis utilities for binary classification and Causata users.

PredictivePower: Predictive power for a single variable.
In Causata: Analysis utilities for binary classification and Causata users.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Example output

Related to PredictivePower in Causata...

R Package Documentation

Browse R Packages

We want your feedback!

Causata Analysis utilities for binary classification and Causata users.

PredictivePower: Predictive power for a single variable. In Causata: Analysis utilities for binary classification and Causata users.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Example output

Related to PredictivePower in Causata...

R Package Documentation

Browse R Packages

We want your feedback!

Causata
Analysis utilities for binary classification and Causata users.

PredictivePower: Predictive power for a single variable.
In Causata: Analysis utilities for binary classification and Causata users.