'DecisionTreeTuningAnalysis' is an automated R code used to generate automated graphical analysis of our paper 'Better Trees: An empirical study on hyperparameter tuning of classification decision trees' [01]. The automated analysis coded here handles data generated by our hyperparameter tuning project (HpTuning) but may be easily extended. The main features available cover the hyperparameter profile of the decision tree induction algorithms, i.e. , answering the following questions:
The installation process is via git clone. You should use the following command inside your terminal session:
git clone https://github.com/rgmantovani/DecisionTreeTuningAnalysis
The classification algorithms analyzed must follow 'mlr' R package implementation [02]. A complete list of the available learners may be found here. The code generated here provides results for two decision tree induction algorithms: J48 (classif.J48) and CART (classif.rpart).
Hyperparameter tuning results should be placed in the data/hptuning_full_space/<algorithm.name>/results
sub-directory. We did not upload raw results since they have more than 50GB of data (But you can download it from here). Thus, we developed some scripts to extract useful information from the executed jobs. These scripts are in the scripts
folder. The automated analysis will only work if these scripts have run before. This is also checked by the automated code and returned to the user with instructions on how to proceed. There are 4 auxiliary scripts:
All extraction scripts require the algorithm's name as a parameter (<algorithm.name>
).
There is no order to run these scripts, but all of them must be executed. The files generated by these scripts will be later read and aggregated as data.frame
objects and used by the automated code.
cd script
Rscript 01_extractRepResults.R --algo=<algorithm.name> &
# examples:
# Rscript 01_extractRepResults.R --algo="classif.J48" &
# Rscript 01_extractRepResults.R --algo="classif.rpart" &
cd script
Rscript 02_extractOptPaths.R --algo=<algorithm.name> &
# examples:
# Rscript 02_extractOptPaths.R --algo="classif.J48" &
# Rscript 02_extractOptPaths.R --algo="classif.rpart" &
cd script
Rscript 03_extractModelStats.R --algo=<algorithm.name> &
# examples:
# Rscript 03_extractModelStats.R --algo="classif.J48" &
# Rscript 03_extractModelStats.R --algo="classif.rpart" &
FAnova marginal predictions are obtained by an external project [03]. This our script will generate input files in the pattern required by the FAnova Python script. To run it:
cd scripts
Rscript 04_createFanovaInputs.R --algo=<algorithm.name> &
# examples:
# Rscript 04_createFanovaInputs.R --algo="classif.J48" &
# Rscript 04_createFanovaInputs.R --algo="classif.rpart" &
The output will be placed in a folder named data/hptuning_full_space/<algorithm.name>/fanova_input
,
with one file per dataset. Provide these files to the external project, and it will also generate one correspondent file per dataset. These new files should be placed in the data/hptuning_full_space/<algorithm.name>/fanova_output
sub-directory.
To run the project, please call it by the following command:
Rscript 01_mainAnalysis.R --algo=<algorithm.name> &
# examples:
# Rscript 01_mainAnalysis.R --algo="classif.rpart" &
# Rscript 01_mainAnalysis.R --algo="classif.J48" &
Meta-level results are independent and can be generated by:
Rscript 02_metaAnalysis.R &
Meta-level results are independent and can be generated by:
Rafael Gomes Mantovani (rgmantovani@gmail.com / rafaelmantovani@utfpr.edu.br), Federal Technology University - Paraná (UTFPR) - Apucarana - PR, Brazil.
[01] Rafael Gomes Mantovani, Tomas Horvath, André L. D. Rossi, Ricardo Cerri, Sylvio Barbon Junior, Joaquin Vanschoren, André C. P. L. F. Carvalho. Better Trees: An empirical study on hyperparameter tuning of classification decision trees. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01002-5.
[02] B. Bischl, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, Zachary Jones. mlr: Machine Learning in R. Journal of Machine Learning in R, v.17, n.170, 2016, pgs 1-5.
[03] F. Hutter, H. Hoos, K. Leyton-Brown. An Efficient Approach for Assessing Hyperparameter Importance. In: Proceedings of the 31th International Conference on Machine Learning, ICMC 2014, Beijing, China, 2014, pgs 754-762.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.