# Developing a Credit Scorecard In scorecard: Credit Risk Scorecard

```knitr::opts_chunk\$set(
collapse = TRUE,
comment = "#>"
)
```

## Traditional Credit Scoring Using Logistic Regression

```library(scorecard)
```

### Data Preparation

Let's use the germancredit dataset for the purposes of this demonstration.

```data("germancredit")
str(germancredit)
```

The `var_filter` function drops column variables that don't meet the thresholds for missing rate (> 95% by default), information value (IV) (< 0.02 by default), or identical value rate (> 95% by default).

```dt_f <- var_filter(germancredit, y = "creditability")
```

### Split Data into Train / Test Sets

When building scorecard models, a subset of the observations should be held out from the data used to train the model (similar to most other traditional modeling approaches), and instead be apportioned to the test set. We can perform this sampling to create the train and test datasets using the `split_df` function.

```dt_list <- split_df(dt_f, y = "creditability", ratio = c(0.6, 0.4), seed = 30)
label_list <- lapply(dt_list, function(x) x\$creditability)
```

### Weight-of-Evidence (WoE) binning

Weight-of-Evidence binning is a technique for binning both continuous and categorical independent variables in a way that provides the most robust bifurcation of the data against the dependent variable. This technique can be easily executed across all independent variables using the `woebin` function.

```bins <- woebin(dt_f, y = "creditability")
# woebin_plot(bins)
```

The user can also adjust bin breaks interactively by using the `woebin_adj` function.

```# breaks_adj <- woebin_adj(dt_f, y = "creditability", bins = bins)
```

Furthermore, the user can set the bin breaks manually via the `breaks_list = list()` argument in the `woebin` function. Note the use of %,% as a separator to create a single bin from two classes in a categorical independent variable.

```breaks_adj <- list(
age.in.years = c(26, 35, 40),
other.debtors.or.guarantors = c("none", "co-applicant%,%guarantor")
)

```

Once your WoE bins are established for all desired independent variables, apply the binning logic to the training and test datasets.

```dt_woe_list <- lapply(dt_list, function(x) woebin_ply(x, bins_adj))
```

### Logistic Regression Example

Logistic regression can often be leveraged effectively to assist in building the scorecards.

```m1 <- glm( creditability ~ ., family = binomial(), data = dt_woe_list\$train)

# vif(m1, merge_coef = TRUE) # summary(m1)

# Select a formula-based model by AIC (or by LASSO for large dataset)
m_step <- step(m1, direction = "both", trace = FALSE)
m2 <- eval(m_step\$call)

# vif(m2, merge_coef = TRUE) # summary(m2)
```

If oversampling is a concern, the following code chunk could be uncommented and run to help adjust for this issue.

```# Read documentation on handling oversampling (support.sas.com/kb/22/601.html)

# library(data.table)

# p1 <- 0.03 # bad probability in population
# r1 <- 0.3 # bad probability in sample dataset

# dt_woe <- copy(dt_woe_list\$train)[, weight := ifelse(creditability == 1, p1/r1, (1-p1)/(1-r1) )][]

# fmla <- as.formula(paste("creditability ~", paste(names(coef(m2))[-1], collapse = "+")))
# m3 <- glm(fmla, family = binomial(), data = dt_woe, weights = weight)
```

### Evaluating Model Performance Using KS & ROC

The `perf_eva` function provides model accuracy statistics (such as mse, rmse, logloss, r2, ks, auc, gini) and plots (such as ks, lift, gain, roc, lz, pr, f1, density).

```# First, get probabalistic predictions
pred_list <- lapply(dt_woe_list, function(x) predict(m2, x, type = 'response'))
# Then evaluate model accuracy
perf <- perf_eva(pred = pred_list, label = label_list)
```

### Create Scorecard

Once the model has been selected, scorecards can be created via the `scorecard` function. Note that the default target points is 600, target odds is 1/19 and points to double the odds is 50. See `?scorecard` for more information on the function and its arguments.

The scorecard can then be applied to the original data using the `scorecard_ply` function. Lastly, a chart encompassing Population Stability Index (PSI) statistics can be rendered via the `perf_psi` function.

```# Build the card