knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The main functionality of this package is to facillitate the steps required to create a predictive plot when working with logistic shape-constrained additive models. Seeing the plot of a spline is vital to understanding which shape constraint to use to better capture the data.

We will walk through an example to demonstrate the use of this package.

Installation

To install the package, copy and paste the following code into your R console:

library(devtools)
devtools::install_github("christithomp/scamplot", build = TRUE, build_opts = c("--no-resave-data", "--no-manual"))

Usage

To use the package, first you must load it into the environment:

library(scamplot)

Data description and preprocessing

We want to see which variables have an impact on the relative risk of being denied for a mortgage. We will consider the risk of being denied a mortgage as a repsonse with several predictors.

library(HRW)
data(BostonMortgages)
names(BostonMortgages)

The predictors we are interested in for our analysis are dir (ratio of the debt payments to the total income), lvr (ratio of the loan size to the assessed value of the property), pbcr (a factor of applicant having public bad credit with levels yes and no), self (a factor of applicant being self-employes with levels yes and no), black (a factor of applicant being black iwth levels yes and no), ccs (credit score ranging from 1 to 6, where a low value is indicative of low credit risk), and Single (a factor of applicant being single with levels yes and no).

We focus on families with a debt to income ratio less than 1.5 and make ccs a factor variable.

BostonMortgages = BostonMortgages[BostonMortgages$dir < 1.5,]
BostonMortgages$ccs = as.factor(BostonMortgages$ccs)

Initial model fit of data

We now fit all the variables as linear terms to set up a baseline model.

library(gam)
fitInit = gam(deny ~ black  + dir + lvr + pbcr + self 
              + single + ccs, family = binomial, data = BostonMortgages)

Next, we decide which variables to keep in the model and which to fit as splines.

stepFit = step.Gam(fitInit,scope =
                      list("black" = ~1 + black,
                           "pbcr" = ~1 + pbcr,
                           "self"  = ~1 + self,
                           "single" = ~1 + single,
                           "ccs" = ~1 + ccs,
                           "dir"  =  ~1 +  dir + s(dir,5),
                           "lvr"  = ~1 + lvr + s(lvr,5)),
                           family = binomial, data = BostonMortgages)
print(names(stepFit$"model")[-1])
detach("package:gam")

Now that we have our model arguments, we fit our final model.

library(mgcv)
fit = mgcv::gam(deny ~ black + s(dir) + s(lvr) + pbcr + self + single + ccs,
                      family = binomial, data = BostonMortgages)

Adding shape constraints to model

Now that we have our model, we want to plot it to see what shape constraints (if any) we could add to better describe our data. We need to visualize the smooth terms of our data before we can add any constraints to them For this section, we can use the function make_scamplot in the R package scamplot.

First, we create vectors for the arguments that scamplot desires. We need the name of our response variable, the names of our linear terms, and the names of our smooth terms and their shapes as separate input into the function.

y = 'deny'
linear_terms = c("ccs", "black", "pbcr", "self", "single")
smooth_terms =  c("dir", "lvr")
shape_type = c("cr", "cr")

Now we can call the function to see the spline fits

make_scamplot(BostonMortgages, y, smooth_terms, linear_terms, shape_type, type = "link")
make_scamplot(BostonMortgages, y, smooth_terms, linear_terms, shape_type, type = "response")

From these plots, it is clear that the ratio of the loan size to the assessed value of property could benefit from a monotonic increasing function in both the logit and probability model. Let's fit the model to see how this shape constraint compares.

First, define the new shape type as a monotonic increasing function

shape_type = c("cr", "mpi")

Next, call the make_scamplot() function to visualize the shape constraint.

make_scamplot(BostonMortgages, y, smooth_terms, linear_terms, shape_type, type = "link")
make_scamplot(BostonMortgages, y, smooth_terms, linear_terms, shape_type, type = "response")

With the new shape constraint, you can see that the logit plot becomes smoother and the probability plot has a higher peak on the right. This makes for a simpler interpretation since there aren't as many irreguarities in the smooth fitted line.

The probability fit for the ratio of debt payments to the total income looks like a convex monotonically increasing function. We can change the shape term to account for this shape constraint and plot the new fit.

shape_type = c("micx", "mpi")
make_scamplot(BostonMortgages, y, smooth_terms, linear_terms, shape_type, type = "response")

Adding in the shape constraint to the probability plot has made the line smoother, which may be more beneficial for interpretation purposes as well.



christithomp/scamplot documentation built on Dec. 10, 2019, 11:33 a.m.