pps: Compute Predictive Power Score
In choonghyunryu/dlookr: Tools for Data Diagnosis, Exploration, Transformation

View source: R/pps.R

pps	R Documentation

Compute Predictive Power Score

Description

The pps() compute PPS(Predictive Power Score) for exploratory data analysis.

Usage

pps(.data, ...)

## S3 method for class 'data.frame'
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)

## S3 method for class 'target_df'
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)

Arguments

`.data`	a target_df or data.frame.
`...`	one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, describe() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.
`cv_folds`	integer. number of cross-validation folds.
`do_parallel`	logical. whether to perform score calls in parallel.
`n_cores`	integer. number of cores to use, defaults to maximum cores - 1.

Details

The PPS is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power).

Value

An object of the class as pps. Attributes of pps class is as follows.

type : type of pps
target : name of target variable
predictor : name of predictor

Information of Predictive Power Score

The information of PPS is as follows.

x : the name of the predictor variable
y : the name of the target variable
result_type : text showing how to interpret the resulting score
pps : the predictive power score
metric : the evaluation metric used to compute the PPS
baseline_score : the score of a naive model on the evaluation metric
model_score : the score of the predictive model on the evaluation metric
cv_folds : how many cross-validation folds were used
seed : the seed that was set
algorithm : text shwoing what algorithm was used
model_type : text showing whether classification or regression was used

References

RIP correlation. Introducing the Predictive Power Score - by Florian Wetschoreck
- https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598

Examples

library(dplyr)

# If you want to use this feature, you need to install the 'ppsr' package.
if (!requireNamespace("ppsr", quietly = TRUE)) {
  cat("If you want to use this feature, you need to install the 'ppsr' package.\n")
}

# pps type is generic =======================================
pps_generic <- pps(iris)
pps_generic

# pps type is target_by =====================================
##-----------------------------------------------------------
# If the target variable is a categorical variable
categ <- target_by(iris, Species)

# compute all variables
pps_cat <- pps(categ)
pps_cat

# compute Petal.Length and Petal.Width variable
pps_cat <- pps(categ, Petal.Length, Petal.Width)
pps_cat

# Using dplyr
pps_cat <- iris %>% 
  target_by(Species) %>% 
  pps()

pps_cat

##-----------------------------------------------------------
# If the target variable is a numerical variable
num <- target_by(iris, Petal.Length)

pps_num <- pps(num)
pps_num

choonghyunryu/dlookr documentation built on June 11, 2024, 9:12 a.m.