pps: Compute Predictive Power Score

View source: R/pps.R

ppsR Documentation

Compute Predictive Power Score

Description

The pps() compute PPS(Predictive Power Score) for exploratory data analysis.

Usage

pps(.data, ...)

## S3 method for class 'data.frame'
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)

## S3 method for class 'target_df'
pps(.data, ..., cv_folds = 5, do_parallel = FALSE, n_cores = -1)

Arguments

.data

a target_df or data.frame.

...

one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. If the first expression is negative, describe() will automatically start with all variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing.

cv_folds

integer. number of cross-validation folds.

do_parallel

logical. whether to perform score calls in parallel.

n_cores

integer. number of cores to use, defaults to maximum cores - 1.

Details

The PPS is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power).

Value

An object of the class as pps. Attributes of pps class is as follows.

  • type : type of pps

  • target : name of target variable

  • predictor : name of predictor

Information of Predictive Power Score

The information of PPS is as follows.

  • x : the name of the predictor variable

  • y : the name of the target variable

  • result_type : text showing how to interpret the resulting score

  • pps : the predictive power score

  • metric : the evaluation metric used to compute the PPS

  • baseline_score : the score of a naive model on the evaluation metric

  • model_score : the score of the predictive model on the evaluation metric

  • cv_folds : how many cross-validation folds were used

  • seed : the seed that was set

  • algorithm : text shwoing what algorithm was used

  • model_type : text showing whether classification or regression was used

References

  • RIP correlation. Introducing the Predictive Power Score - by Florian Wetschoreck

    • https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598

See Also

print.relate, plot.relate.

Examples


library(dplyr)

# pps type is generic =======================================
pps_generic <- pps(iris)
pps_generic

# summary pps class 
mat <- summary(pps_generic)
mat

# visualize pps class 
plot(pps_generic)


# pps type is target_by =====================================
##-----------------------------------------------------------
# If the target variable is a categorical variable
categ <- target_by(iris, Species)

# compute all variables
pps_cat <- pps(categ)
pps_cat

# compute Petal.Length and Petal.Width variable
pps_cat <- pps(categ, Petal.Length, Petal.Width)
pps_cat

# Using dplyr
pps_cat <- iris %>% 
  target_by(Species) %>% 
  pps()

pps_cat

# Using parallel process
# pps_cat <- iris %>% 
#   target_by(Species) %>% 
#   pps(do_parallel = TRUE)
# 
# pps_cat

# summary pps class 
tab <- summary(pps_cat)
tab

# visualize pps class
plot(pps_cat)

##-----------------------------------------------------------
# If the target variable is a numerical variable
num <- target_by(iris, Petal.Length)

pps_num <- pps(num)
pps_num

# summary pps class 
tab <- summary(pps_num)
tab

# plot pps class
plot(pps_num)



dlookr documentation built on July 9, 2023, 6:31 p.m.