knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Fast-and-frugal classification in R (ffcr)

The ffcr package constructs two transparent classification models: fast-and-frugal trees and tallying models. The book Classification in the Wild: The Science and Art of Transparent Decision Making. (Katiskopoulos et al., 2020) describes these models, their applications and the algorithms to construct them in detail.

A fast-and-frugal tree is a decision tree with a simple structure: one branch of each node exits tree, the other continues to the next node until the final node is reached. A tallying model gives pieces of evidence the same weight. This package implements several methods for training these two models, ranging from simple heuristics to computationally more complex cross-entropy optimization (Rubinstein, 1999).

Installation

The package can be installed with the following command:

# install.packages("devtools")
devtools::install_github("marcusbuckmann/ffcr")

Alternatively, Windows users can download the binary file ffcr_1.0.zip and install it from the hard drive.

Training fast-and-frugal trees

We start by training a fast-and-frugal tree on a medical data set, predicting whether patients have liver disease (Ramana, Babu, and Venkateswarlu, 2011; Dua and Graff, 2017). The data set is included in the package.

library(ffcr)
model <- fftree(diagnosis~., data = liver, use_features_once = FALSE, method = "greedy", max_depth = 4)

The structure of the tree and its performance is shown by printing the model.

print(model)

To visualize the tree we use

plot(model)
knitr::include_graphics("man/figures/tree.png")

To make predictions according to the fast-and-frugal tree, we can use the predict function.

pred <- predict(model, newdata = liver[1:10,])
pred

Training tallying models

To train tallying models, we use the tally function. To make predictions we use the same command as for the fast-and-frugal trees.

model <- tally(diagnosis~., data = liver, max_size = 4)
print(model)
pred <- predict(model, newdata = liver[1:10,])

Please consult the vignette and the documentation of the package for more details on how to use it. The book Classification in the Wild provides background information on fast-and-frugal trees and tallying models.

References

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Katikopoulos, Konstantinos V., Özgür Şimşek, Marcus Buckmann, and Gerd Gigerenzer. 2020. Classification in the Wild: The Science and Art of Transparent Decision Making. MIT Press.

Ramana, Bendi Venkata, M Surendra Prasad Babu, and N. B. Venkateswarlu. 2011. “A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis.” International Journal of Database Management Systems 3 (2): 101–114.

Rubinstein, Reuven. 1999. “The Cross-Entropy Method for Combinatorial and Continuous Optimization.” Methodology and Computing in Applied Probability 1 (2): 127–190.



marcusbuckmann/ffcr documentation built on Jan. 4, 2024, 3:45 p.m.