migrate
In migrate: Create Credit State Migration (Transition) Matrices

options(width = 999)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(migrate)

Using {migrate}

This package is intended to serve as a set of tools to help convert credit risk data at two timepoints into traditional state transition matrices. At a higher level, {migrate} is intended to help an analyst understand how risk moved in their credit portfolio over a time interval.

Background

One of the more difficult aspects of making a state migration matrix in R (or Python, for that matter) is the fact that the output doesn't satisfy the structure of a traditional data frame object. Rather, the output needs to be a matrix, which is a data structure that R does support. In the past, there has been difficulty converting a matrix to something more visual-friendly. More recently, however, tools like the kableExtra and gt packages allow us to present visually appealing output that extends the structure of a data frame. Using the matrix-style output of {migrate}'s functions with a visual formatting package such as the two mentioned above will hopefully help analysts streamline the presentation of their credit portfolio's state migration matrices to an audience.

Getting Started

If you haven't done so already, first install {migrate} with the instructions in the README section.

First, load the package using library()

library(migrate)

The package has a built-in mock dataset, which can be loaded into the environment like so:

data("mock_credit")

head(mock_credit[order(mock_credit$customer_id), ])   # sort by 'customer_id'

head(mock_credit[order(mock_credit$customer_id), ]) |>
  knitr::kable(row.names = FALSE)

Note that an important feature of the mock_credit dataset is that there are exactly two (2) unique values in the date column variable; if the time argument passed to migrate() has more than two (2) unique values, the function will throw an error.

unique(mock_credit$date)

To summarize the migration within the data, use the migrate() function

migrated_df <- migrate(
  data = mock_credit,
  id = customer_id,
  time = date,
  state = risk_rating,
)
head(migrated_df)

To create the state transition matrix, use the build_matrix() function

build_matrix(migrated_df)

Or, to do it all in one shot, use the |>

mock_credit |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    metric = principal_balance,
    percent = FALSE,
    verbose = FALSE
  ) |>
  build_matrix(
    state_start = risk_rating_start,
    state_end = risk_rating_end,
    metric = principal_balance
  )

Handle IDs with observations at a single timepoint

The following code creates a dataframe that features 500 customers with the following characteristics:

470 customers have a value at both timepoints
20 customers have a value only at the first timepoint
10 customers have a value only at the second timepoint

mock_credit_with_missing <- mock_credit |>
  # Remove the value at the first timepoint for 10 customers
  dplyr::slice(-(1:10)) |>
  # Remove the value at the last timepoint for 20 customers
  dplyr::slice(-((dplyr::n() - 19):dplyr::n()))

Check that the new dataframe has information about 500 customers:

# Number of unique customer_id values in mock_credit_with_missing
dplyr::n_distinct(mock_credit_with_missing$customer_id)

By default, migrate() drops observations that belong to IDs found at a single timepoint. migrate() informs such behavior through a warning:

migrated_data_without_fill_state <- mock_credit_with_missing |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    percent = FALSE,
    verbose = FALSE
  )

Notice that only 470 customers have been migrated:

migrated_data_without_fill_state |>
  dplyr::pull(count) |>
  sum()

You can use migrate()'s fill_state argument to ensure that no information is lost during the migration process. When a filler state value (e.g., a character string such as "No Rating" or "NR") is assigned to fill_state, IDs with a single timepoint are not removed but rather migrated from or to this filler state.

When verbose = TRUE a message will provide additional information about the IDs with missing timepoints:

migrated_data_with_fill_state <- mock_credit_with_missing |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    fill_state = "No Rating",
    percent = FALSE,
    verbose = TRUE
  )

Check that 500 customers were migrated:

migrated_data_with_fill_state |>
  dplyr::pull(count) |>
  sum()

So far we have been using count as the metric to easily determine the amount of customers that migrated in each scenario. The following code provides an example migration that leverages principal_balance as the metric:

mock_credit_with_missing |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    metric = principal_balance,
    fill_state = "No Rating",
    percent = FALSE,
    verbose = FALSE
  ) |>
  build_matrix(
    state_start = risk_rating_start,
    state_end = risk_rating_end,
    metric = principal_balance
  )