library("riskr")
library("printr") # remove this for vignette
library("ggplot2")
library("ggthemes")
options(digits = 3, knitr.table.format = "markdown")
knitr::opts_chunk$set(collapse = TRUE, warning = FALSE,
                      fig.path = "vignettes/figures/",
                      fig.width = 6, fig.height = 6,
                      fig.align = "center", dpi = 72)

theme_set(theme_fivethirtyeight(base_size = 11) +
            theme(rect = element_rect(fill = "white"),
                  axis.title = element_text(colour = "grey30"),
                  axis.title.y = element_text(angle = 90),
                  strip.background = element_rect(fill = "#434348"),
                  strip.text = element_text(color = "#F0F0F0"),
                  plot.title = element_text(face = "plain", size = structure(1.2, class = "rel")),
                  panel.margin.x =  grid::unit(1, "cm"),
                  panel.margin.y =  grid::unit(1, "cm")))
update_geom_defaults("line", list(colour = "#434348", size = 1.05))
update_geom_defaults("point", list(colour = "#434348", size = 3))
update_geom_defaults("bar", list(fill = "#7cb5ec"))
update_geom_defaults("text", list(size = 4, colour = "gray30"))

travis-status version downloads

Introduction

The riskr package facilitate credit scoring tasks such as measure the scores/models performance and make easy the scoring modelling process.

There are function to:

  1. Measure in a simple way the performance of models via wrappers/shortcuts from ROCR functions.
  2. Visualize relationships between variables.
  3. Compute usual values in the credit scoring PSI, WOE, IV, KS, AUCROC, among others.
  4. Make easier the modelling and validation process.

Assumptions

riskr assume the target variable is binary with numeric values: 0 and 1. Usually 1 means the characteristic of interest. For example 0 is a default operation and 1 a non-default one.

Installation

You can install the latest development version from github with:

source("https://install-github.me/jbkunst/riskr")

# or

devtools::install_github("jbkunst/riskr")

Functions

Performance Indicators & Plots

Usually we have a data frame with a target variable and a score (or probability) like this:

library("riskr")

data("predictions")

head(predictions)

score <- predictions$score

target <- predictions$target

The main statistics or indicators are KS, AUCROC so:

perf(target, score)

There are functions to calculate every indicator.

aucroc(target, score)

There are some functions to plot the score/model performance (based on ggplot package).

gg_perf(target, score)

And:

gg_roc(target, score)

gg_gain(target, score)

gg_lift(target, score)

Tables (Uni/Bivariate) & Plots

data("credit")

ft(credit$marital_status)

bt(credit$marital_status, credit$bad)

credit$age_bin <- bin_sup(credit$age, credit$bad, min.p = 0.20)$variable_new

bt(credit$age_bin, credit$bad)
gg_ba(credit$age_bin, credit$bad)

The minified version of gg_ba

gg_ba2(credit$age_bin, credit$bad) + ggtitle("Age")

Odds Tables

The odds tables are other way to show how a score/model performs.

score <- round(predictions$score * 1000)

odds_table(target, score, nclass = 5) # default is (nclass =) 10 groups of equal size

Ranking Predictive Variables

ranks <- pred_ranking(credit, "bad")
head(ranks)

Confusion Matrix

The conf_matrix function return a list with the next elements:

target_pred <- ifelse(score < 500, 0, 1)

cm <- conf_matrix(target_pred, target)
cm$confusion.matrix
cm$indicators

Related work

  1. woe package by tomasgreif
  2. smbinning package by Herman Jopia. Github repository.
  3. Guide to Credit Scoring in R
  4. Gains package
  5. plotROC package by Michael Sachs
  6. InformationValue by selva86

Session Info


print(sessionInfo())


jbkunst/riskr documentation built on May 18, 2019, 7 p.m.