getDecisionsMetrics: Measure the error, prediction and importance of decisions
In aruaud/endoR: Interpret and Visualize Stable Tree Ensemble Models

getDecisionsMetrics

R Documentation

Measure the error, prediction and importance of decisions

Description

This function measures the prediction and error on the response variable of each decision on its support in the data passed. The importance is calculated by default but this can be switched off.

Usage

getDecisionsMetrics(
  ruleExec,
  data,
  target,
  classPos = NULL,
  importances = TRUE,
  in_parallel = FALSE,
  n_cores = detectCores() - 1,
  cluster = NULL
)

Arguments

`ruleExec`	a vector with name "condition" or a data.frame with a column "condition".
`data`	data from which to get the decision support.
`target`	response variable.
`classPos`	for clssification tasks, the positive class to be predicted by decisions.
`importances`	if FALSE, the importances are not calculated (importances = TRUE by default).
`in_parallel`	if TRUE, the function is run in parallel.
`n_cores`	if in_parallel = TRUE, and no cluster has been passed: number of cores to use.
`cluster`	the cluster to use to run the function in parallel.

Value

a datatable with the rule (column "condition"), error ("err"), prediction ("pred") support, number of variables in the decision rule ("len"). Columns "gain" and "imp" wit hthe gain and importance of teh decision are added if importances were calculated.

Examples

library(randomForest)
library(caret)
library(data.table)

# import data and fit model
data(iris)
mod <- randomForest(Species ~ ., data = iris)

# Let's get the decision ensemble. One could use the wrapping function
# model2DE() but, we will run each function separately.

# Get the raw decision ensemble
de <- preCluster(model = mod, model_type = "rf", data = iris[, -5]
        , target = iris$Species, classPos = "setosa"
        , times = 1 # number of bootstraps, here just one
        , discretize = FALSE) # we will discretize outside for the example
summary(de)
# exec = the decision ensemble
# partitions = list of sample indexes for boostrapping
# if we had done discretization, the new data would be in data_ctg
de <- de$exec

# Discretize variables in 3 categories - optional
de <- discretizeDecisions(rules = de, data = iris[, -5], target = iris$Species
        , K = 3, classPos = "setosa", mode = "data")
data_ctg <- de$data_ctg
de <- de$rules_ctg

# Homogenize the decision ensemble
de <- de[, condition := sapply(condition, function(x) {
  paste(sort(unlist(strsplit(x, split = " & "))), collapse = " & ")
})]
de <- unique(
          as.data.table(de)[, n := as.numeric(n)][, n := sum(n), by = condition]
          )

# Calculate decision metrics, we don't need the importances yet since we will
# do pruning. Otherwise, set importances = TRUE and skip the next 2 steps.
de_met <- getDecisionsMetrics(de, data = data_ctg, target = iris$Species
            , classPos = "setosa", importances = FALSE)
de <- de[de_met, on = "condition"]

# Pruning - optional
de <- pruneDecisions(rules = de, data = data_ctg, target = iris$Species
        , classPos = "setosa")

# Decision importances
de <- decisionImportance(rules = de, data = data_ctg, target = iris$Species
        , classPos = "setosa")

# Filter out decisions with the lowest importance: min_imp = the minimal
# importance in the decision ensemble compared to the maximal one.
# E.g., if min_imp = 0.5, then at least all decisions with an
# importance > 0.5*max(importance) will be kept.
# This ensures that we don't throw out too much.
# Since the decision ensemble is quite small, we don't need to filter much...
de <- filterDecisionsImportances(rules = de, min_imp = 0.1)

# Get the network
de_net <- getNetwork(rules = de, data = data_ctg, target = iris$Species
            , classPos = "setosa")

# Plot the feature importance/influence and the network
plotFeatures(de_net, levels_order = c("Low", "Medium", "High"))
plotNetwork(de_net, hide_isolated_nodes = FALSE, layout = "fr")

aruaud/endoR documentation built on Jan. 25, 2025, 2:20 a.m.