knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) require(binnr) data(titanic) levels(titanic$Embarked)[levels(titanic$Embarked) == ""] <- "C"
binnr
is an R package that helps modelers build scorecards. Scorecard models
need to be more than predictive. Regulatory oversight often necessitates they be
transparent as well. Model transparency is difficult to enforce in more
predictive, non-linear methods such as neural networks, random forest, or
gradient boosted decision trees. Often direct modeler intervention is required
to ensure proper treatment is given for values of independent variables.
binnr
attempts to solve these problems by providing interactive variable
manipulation facilities giving the modeler total control over how variables
are treated in scorecard models.
The easiest way to install binnr
is by using the devtools
package:
if (!require(devtools)) install.packages("devtools") devtools::install_github(repo="Zelazny7/binnr")
binnr
is comprised of three main modeling steps: Bin, Fit, & Adjust. Each of
these steps are outlined in the following sections.
The Bin phase prepares a data set of explanatory features for modeling by discretizing variables and wrapping in an object that provides data manipulation capabilities. How variables are treated by the binning step depends on their type. Numeric features such as age or income are discretized using a performance metric. For example, if the performance variable is binary, the numeric independent variables are discretized using information value.
If the variable is a categorical field (factors in R) no discretization can be performed; however, the features is still wrapped by a bin object and can be interacted with as such.
mod <- bin(data=titanic[,-1], y=titanic$Survived)
Calling the bin
function returns a Scorecard object. The Scorecard object
contains methods for manipulating and fitting the discretized variables, or
"bins" in binnr parlance. The Scorecard object maintains a list of bins as and
keeps track of the operations requested by the modelers. It also contains a list
of fitted scorecard models that can be adjusted or reviewed. These capabilities
will be expanded upon in later sections.
mod <- bin(data=titanic[,-1], y=titanic$Survived)
mod
Bin variables can be accessed directly. Though as we'll see later, this is not recommended. But for illustration purposes, the following output is the kind of information a bin object typically stores:
mod$variables$Pclass
Additionally, bin objects can be plotted by calling their plot
method.
mod$variables$Pclass$plot()
The second step of scorecard developement using binnr
is fitting a model.
binnr
uses the glmnet
package under the hood to fit a regularized
regression model. The discretized variables have their observed performance
values substituted for their actual input values to create a completely
continuous dataset.
For example, if an age variable is discretized using binary performance the resulting substitution uses the observed weight-of-evidence rather than the age itself.
Performance substitution is used to put all predictors on the same scale and force a linear relationship between each predictor and the response variables.
mod$fit("model 1", "initial model with all variables")
mod
Once a model is fit, it can be predicted as well. This returns a score value appropriate for the type of model that was fit to the response variable. In the case of binary performance, a binomial response was selected resulting in a logit.
pred <- mod$predict()
pred <- mod$predict()
head(pred)
The most important functionality provided by binnr
is the manipulation of
bin objects.
Bins can be collapsed:
mod$variables$Pclass$collapse(c(1,3)) mod$variables$Pclass$show() # Alternate syntax # mod$variables$Pclass - c(1,3)
Bins can be expanded:
mod$variables$Pclass$expand(1) mod$variables$Pclass$show() # Alternate syntax # mod$variables$Pclass + 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.