benchmarkQCBA: Learn and evaluate QCBA postprocessing on multiple rule...

View source: R/rMARC.R

benchmarkQCBAR Documentation

Learn and evaluate QCBA postprocessing on multiple rule learners. This can be, for example, used to automatically select the best model for a given use case based on a combined preference for accuracy and model size.

Description

Learn multiple rule models using base rule induction algorithms from arulesCBA and apply QCBA to postprocess them.

Usage

benchmarkQCBA(
  train,
  test,
  classAtt,
  train_disc = NULL,
  test_disc = NULL,
  cutPoints = NULL,
  algs = c("CBA", "CMAR", "CPAR", "PRM", "FOIL2"),
  iterations = 2,
  rounding_places = 3,
  return_models = FALSE,
  debug_prints = FALSE,
  ...
)

Arguments

train

data frame with training data

test

data frame with testing data before postprocessing

classAtt

the name of the class attribute

train_disc

prediscretized training data

test_disc

prediscretized tet data

cutPoints

specification of cutpoints applied on the data (ignored if train_disc is null)

algs

vector with names of baseline rule learning algorithms. Names must correspond to function names from the arulesCBA library

iterations

number of executions over base learner, which is used for obtaining a more precise estimate of build time

rounding_places

statistics in the resulting dataframe will be rounded to specified number of decimal places

return_models

boolean indicating if also learnt rule lists (baseline and postprocessed) should be included in model output

debug_prints

print debug information such as rule lists

...

Parameters for base learners, the name of the argument is the base learner (one of 'algs' values) and value is a list of parameters to pass. To specify parameters for QCBA pass "QCBA". See also Example 3.

Value

Outputs a dataframe with evaluation metrics and if 'return_models==TRUE' also the induced baseline and QCBA models (see also Example 3). Included metrics in the dataframe with statistics: **accuracy**: percentage of correct predictions in the test set **rulecount**: number of rules in the rule list. Note that for QCBA the count includes the default rule (rule with empty antecedent), while for base learners this rule may not be included (depending on the base learner) **modelsize**: total number of conditions in the antecedents of all rules in the model **buildtime**: learning time for inference of the model. In case of QCBA, this excludes time for the induction of the base learner

See Also

[qcba()] which this function wraps.

Examples

# EXAMPLE 1: pass train and test folds, induce multiple base rule learners,
# postprocess each with QCBA and return benchmarking results.
## Not run: 
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
# Define input dataset and target variable 
df_all <-datasets::iris
classAtt <- "Species"

# Create train/test partition using built-in R functions
tot_rows<-nrow(df_all)  
train_proportion<-2/3
df_all <- df_all[sample(tot_rows),]
trainFold <- df_all[1:(train_proportion*tot_rows),]
testFold <- df_all[(1+train_proportion*tot_rows):tot_rows,]
# learn with default metaparameter values
stats<-benchmarkQCBA(trainFold,testFold,classAtt)
print(stats)
# print relative change of QCBA results over baseline algorithms 
print(stats[,6:10]/stats[,0:5]-1)
}
## End(Not run)
# EXAMPLE 2: As Example 1 but data are discretizated externally
# Discretize numerical predictors using built-in discretization
# This performs supervised, entropy-based discretization (Fayyad and Irani, 1993)
# of all numerical predictor variables with 3 or more distinct numerical values
# This example could run for more than 5 seconds
## Not run: 
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
  discrModel <- discrNumeric(trainFold, classAtt)
  train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor))
  test_disc <- applyCuts(testFold, discrModel$cutp, infinite_bounds=TRUE, labels=TRUE)
  stats<-benchmarkQCBA(trainFold,testFold,classAtt,train_disc,test_disc,discrModel$cutp)
  print(stats)
}
## End(Not run)
# EXAMPLE 3: pass custom metaparameters to selected base rule learner,
# then postprocess with QCBA, evaluate, and return both models
# This example could run for more than 5 seconds
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
# use only CBA as a base learner, return rule lists.
## Not run: 
  output<-benchmarkQCBA(trainFold,testFold,classAtt,train_disc,test_disc,discrModel$cutp, 
                     CBA=list("support"=0.05,"confidence"=0.5),algs = c("CPAR"),
                     return_models=TRUE)
  message("Evaluation statistics")
  print(output$stats)
  message("CPAR model")
  inspect(output$CPAR[[1]])
  message("QCBA model")
  print(output$CPAR_QCBA[[1]])

## End(Not run)
}

qCBA documentation built on Sept. 11, 2024, 7:37 p.m.

Related to benchmarkQCBA in qCBA...