benchmarkQCBA: Learn and evaluate QCBA postprocessing on multiple rule...
In qCBA: Postprocessing of Rule Classification Models Learnt on Quantized Data

benchmarkQCBA

R Documentation

Learn and evaluate QCBA postprocessing on multiple rule learners. This can be, for example, used to automatically select the best model for a given use case based on a combined preference for accuracy and model size.

Description

Learn multiple rule models using base rule induction algorithms from arulesCBA and apply QCBA to postprocess them.

Usage

benchmarkQCBA(
  train,
  test,
  classAtt,
  train_disc = NULL,
  test_disc = NULL,
  cutPoints = NULL,
  algs = c("CBA", "CMAR", "CPAR", "PRM", "FOIL2"),
  iterations = 2,
  rounding_places = 3,
  return_models = FALSE,
  debug_prints = FALSE,
  seed = 1,
  ...
)

Arguments

`train`	data frame with training data
`test`	data frame with testing data before postprocessing
`classAtt`	the name of the class attribute
`train_disc`	prediscretized training data
`test_disc`	prediscretized tet data
`cutPoints`	specification of cutpoints applied on the data (ignored if train_disc is null)
`algs`	vector with names of baseline rule learning algorithms. Names must correspond to function names from the arulesCBA library
`iterations`	number of executions over base learner, which is used for obtaining a more precise estimate of build time
`rounding_places`	statistics in the resulting dataframe will be rounded to specified number of decimal places
`return_models`	boolean indicating if also learnt rule lists (baseline and postprocessed) should be included in model output
`debug_prints`	print debug information such as rule lists
`seed`	random seed value
`...`	Parameters for base learners, the name of the argument is the base learner (one of 'algs' values) and value is a list of parameters to pass. To specify parameters for QCBA pass "QCBA". See also Example 3.

Value

Outputs a dataframe with evaluation metrics and if 'return_models==TRUE' also the induced baseline and QCBA models (see also Example 3). Included metrics in the dataframe with statistics: **accuracy**: percentage of correct predictions in the test set **rulecount**: number of rules in the rule list. Note that for QCBA the count includes the default rule (rule with empty antecedent), while for base learners this rule may not be included (depending on the base learner) **modelsize**: total number of conditions in the antecedents of all rules in the model **buildtime**: learning time for inference of the model. In case of QCBA, this excludes time for the induction of the base learner

Examples

# EXAMPLE 1: pass train and test folds, induce multiple base rule learners,
# postprocess each with QCBA and return benchmarking results.
## Not run: 
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
# Define input dataset and target variable 
df_all <-datasets::iris
classAtt <- "Species"

# Create train/test partition using built-in R functions
tot_rows<-nrow(df_all)  
train_proportion<-2/3
df_all <- df_all[sample(tot_rows),]
trainFold <- df_all[1:(train_proportion*tot_rows),]
testFold <- df_all[(1+train_proportion*tot_rows):tot_rows,]
# learn with default metaparameter values
stats<-benchmarkQCBA(trainFold,testFold,classAtt)
print(stats)
# print relative change of QCBA results over baseline algorithms 
print(stats[,6:10]/stats[,0:5]-1)
}
## End(Not run)
# EXAMPLE 2: As Example 1 but data are discretizated externally
# Discretize numerical predictors using built-in discretization
# This performs supervised, entropy-based discretization (Fayyad and Irani, 1993)
# of all numerical predictor variables with 3 or more distinct numerical values
# This example could run for more than 5 seconds
## Not run: 
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
  discrModel <- discrNumeric(trainFold, classAtt)
  train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor))
  test_disc <- applyCuts(testFold, discrModel$cutp, infinite_bounds=TRUE, labels=TRUE)
  stats<-benchmarkQCBA(trainFold,testFold,classAtt,train_disc,test_disc,discrModel$cutp)
  print(stats)
}
## End(Not run)
# EXAMPLE 3: pass custom metaparameters to selected base rule learner,
# then postprocess with QCBA, evaluate, and return both models
# This example could run for more than 5 seconds
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
# use only CBA as a base learner, return rule lists.
## Not run: 
  output<-benchmarkQCBA(trainFold,testFold,classAtt,train_disc,test_disc,discrModel$cutp, 
                     CBA=list("support"=0.05,"confidence"=0.5),algs = c("CPAR"),
                     return_models=TRUE)
  message("Evaluation statistics")
  print(output$stats)
  message("CPAR model")
  inspect(output$CPAR[[1]])
  message("QCBA model")
  print(output$CPAR_QCBA[[1]])

## End(Not run)
}

qCBA documentation built on April 12, 2025, 1:59 a.m.