rf.interaction.transformer: Extract interactions from random forest

View source: R/27_RF_INTERACTION_TRANSFORMER.R

rf.interaction.transformerR Documentation

Extract interactions from random forest

Description

rf.interaction.transformer extracts the interactions from random forest. It implements customized random forest algorithm that takes into account different conditions (for single decision tree) such as minimum percentage of observations and defaults in each node, maximum tree depth and monotonicity condition at each splitting node. Gini index is used as metric for node splitting .

Usage

rf.interaction.transformer(
  db,
  rf,
  target,
  num.rf = NA,
  num.tree,
  min.pct.obs,
  min.avg.rate,
  max.depth,
  monotonicity,
  create.interaction.rf,
  seed = 991
)

Arguments

db

Data frame of risk factors and target variable supplied for interaction extraction.

rf

Character vector of risk factor names on which decision tree is run.

target

Name of target variable (default indicator 0/1) within db argument.

num.rf

Number of risk factors randomly selected for each decision tree. If default value (NA) is supplied, then number of risk factors will be calculated as sqrt(number of all supplied risk factors).

num.tree

Number of decision trees used for random forest.

min.pct.obs

Minimum percentage of observation in each leaf.

min.avg.rate

Minimum percentage of defaults in each leaf.

max.depth

Maximum number of splits.

monotonicity

Logical indicator. If TRUE, observed trend between risk factor and target will be preserved in splitting node.

create.interaction.rf

Logical indicator. If TRUE, second element of the output will be data frame with interaction modalities.

seed

Random seed to ensure result reproducibility.

Value

The command rf.interaction.transformer returns a list of two data frames. The first data frame provides the trees summary. The second data frame is a new risk factor extracted from random forest.

Examples

#modify risk factors in order to show how the function works with missing values
loans$"Account Balance"[1:10] <- NA
loans$"Duration of Credit (month)"[c(13, 15)] <- NA
rf.it <- rf.interaction.transformer(db = loans, 
			     rf = names(loans)[!names(loans)%in%"Creditability"], 
			     target = "Creditability",
			     num.rf = NA, 
			     num.tree = 3,
			     min.pct.obs = 0.05,
			     min.avg.rate = 0.01,
			     max.depth = 2,
			     monotonicity = TRUE,
			     create.interaction.rf = TRUE,
			     seed = 579)
names(rf.it)
rf.it[["tree.info"]]
tail(rf.it[["interaction"]])
table(rf.it[["interaction"]][, 1], useNA = "always")

PDtoolkit documentation built on Sept. 20, 2023, 9:06 a.m.