knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
require(binnr)
data(titanic)
levels(titanic$Embarked)[levels(titanic$Embarked) == ""] <- "C"

What is binnr?

binnr is an R package that helps modelers build scorecards. Scorecard models need to be more than predictive. Regulatory oversight often necessitates they be transparent as well. Model transparency is difficult to enforce in more predictive, non-linear methods such as neural networks, random forest, or gradient boosted decision trees. Often direct modeler intervention is required to ensure proper treatment is given for values of independent variables.

binnr attempts to solve these problems by providing interactive variable manipulation facilities giving the modeler total control over how variables are treated in scorecard models.

Installing binnr

The easiest way to install binnr is by using the devtools package:

if (!require(devtools)) install.packages("devtools")
devtools::install_github(repo="Zelazny7/binnr")

Overview of binnr

binnr is comprised of three main modeling steps: Bin, Fit, & Adjust. Each of these steps are outlined in the following sections.

Bin

The Bin phase prepares a data set of explanatory features for modeling by discretizing variables and wrapping in an object that provides data manipulation capabilities. How variables are treated by the binning step depends on their type. Numeric features such as age or income are discretized using a performance metric. For example, if the performance variable is binary, the numeric independent variables are discretized using information value.

If the variable is a categorical field (factors in R) no discretization can be performed; however, the features is still wrapped by a bin object and can be interacted with as such.

mod <- bin(data=titanic[,-1], y=titanic$Survived)

Calling the bin function returns a Scorecard object. The Scorecard object contains methods for manipulating and fitting the discretized variables, or "bins" in binnr parlance. The Scorecard object maintains a list of bins as and keeps track of the operations requested by the modelers. It also contains a list of fitted scorecard models that can be adjusted or reviewed. These capabilities will be expanded upon in later sections.

mod <- bin(data=titanic[,-1], y=titanic$Survived)
mod

Bin variables can be accessed directly. Though as we'll see later, this is not recommended. But for illustration purposes, the following output is the kind of information a bin object typically stores:

mod$variables$Pclass

Additionally, bin objects can be plotted by calling their plot method.

mod$variables$Pclass$plot()

Fit

The second step of scorecard developement using binnr is fitting a model. binnr uses the glmnet package under the hood to fit a regularized regression model. The discretized variables have their observed performance values substituted for their actual input values to create a completely continuous dataset.

For example, if an age variable is discretized using binary performance the resulting substitution uses the observed weight-of-evidence rather than the age itself.

Performance substitution is used to put all predictors on the same scale and force a linear relationship between each predictor and the response variables.

mod$fit("model 1", "initial model with all variables")
mod

Once a model is fit, it can be predicted as well. This returns a score value appropriate for the type of model that was fit to the response variable. In the case of binary performance, a binomial response was selected resulting in a logit.

pred <- mod$predict()
pred <- mod$predict()
head(pred)

Adjust

The most important functionality provided by binnr is the manipulation of bin objects.

Bins can be collapsed:

mod$variables$Pclass$collapse(c(1,3))
mod$variables$Pclass$show()

# Alternate syntax
# mod$variables$Pclass - c(1,3)

Bins can be expanded:

mod$variables$Pclass$expand(1)
mod$variables$Pclass$show()

# Alternate syntax
# mod$variables$Pclass + 1


Zelazny7/rubbish documentation built on May 10, 2019, 1:56 a.m.