In marcusbuckmann/ffcr: Train two families of transparent classification models: fast-and-frugal trees and tallying

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This document gives an introduction on the functionality of the ffcr package. This package allows the construction of two families of transparent classification models: fast-and-frugal trees and tallying models. A fast-and-frugal tree is a decision tree with a simple structure: one branch of each node exits tree, the other continues to the next node until the final node is reached. A tallying model gives pieces of evidence the same weight. The package contains two main functions: fftree to train fast-and-frugal trees and tally to train tallying models.

To illustrate the functionality of the package, we use the Liver data set [@ramana2011] that we obtained form the UCI machine learning repository [@dua2017]. It contains 579 patients of which 414 have a liver the condition and the other 165 do not. We predict which patient has a liver condition using medical measures and the age and gender of the people.

We start with loading the package and the data.

# library(ffcr)
devtools::load_all(".")
data(liver)

Learning fast-and-frugal trees

The fftr functions encompasses three different methods to train fast-and-frugal trees. These are named basic, greedy, and best-fit^[We use the cross-entropy method [@rubinstein1999]. It does not guarantee to find the best possible tree but produces very accurate trees, on average.] and are described in the book.

Training a fast-and-frugal tree

We train or first fast-and-frugal tree on the Liver data set. If the first column in the data set is the class label, we can simply pass the data set as the first argument.

model <- fftree(liver, use_features_once = FALSE, method = "greedy", max_depth = 6)

Alternatively, we can call the function using the formula syntax. Here we train the fast-and-frugal tree using only only a few selected features.

fftree(diagnosis ~ sex + age  + albumin + proteins + aspartate  , data = liver)

The model object shows the structure of the trees and its performance on the data set.

print(model)

To visualize the tree we use

plot(model)

How does the fast-and-frugal tree perform in cross-validation? By default the model is fitted to the complete data set but if we set 'cv = TRUE', 10-fold cross-validation is used to estimate the predictive performance of the tree. The model saved in the object is fitted on the complete training set. In fitting and prediction, the sensitivity is very high, while the specificity is low. The majority of the patients (71%) have liver disease, therefore predicting liver disease for most objects will produce a highly accurate tree. To avoid that, we can weigh the objects such that the objects in both classes get the same share. Let p be the proportion of patients that have liver disease. We weigh the patients with liver disease by 1-p, and the patients without disease by p.

Note how sensitivity and specificity are more similar now:

p <- sum(liver$diagnosis == "Liver disease")/nrow(liver)
model <- fftree(liver, weights = c(1-p,p), cv = TRUE)
model

To make predictions according to a fast-and-frugal tree, we can use the predict function. It returns either the class label (response), the predicted probability of belonging to one of the classes (probability) or the performance across the observations (metric). Note that for the latter, the class labels need to be included in the data that is passed to the predict function.

model <- fftree(diagnosis ~ ., data = liver[1:300,], weights = c(1-p,p))

predict(model, newdata = liver[301:310,], type = "response")

predict(model, newdata = liver[301:310,], type = "probability")

predict(model, newdata = liver[301:nrow(liver),], type = "metric")

Learning tallying models

The package implements two different methods to train tallying models, which are also explained in the book. These are named basic and best-fit.^[We again use the cross-entropy method.]

p <- sum(liver$diagnosis == "Liver disease")/nrow(liver)
model <- tally(diagnosis ~ ., data = liver[1:300,], weights = c(1-p,p), max_size = 6)

predict(model, newdata = liver[301:nrow(liver),], type = "metric")

References

marcusbuckmann/ffcr documentation built on Jan. 4, 2024, 3:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

marcusbuckmann/ffcr
Train two families of transparent classification models: fast-and-frugal trees and tallying

In marcusbuckmann/ffcr: Train two families of transparent classification models: fast-and-frugal trees and tallying

Introduction

Learning fast-and-frugal trees

Training a fast-and-frugal tree

Learning tallying models

References

R Package Documentation

Browse R Packages

We want your feedback!

marcusbuckmann/ffcr Train two families of transparent classification models: fast-and-frugal trees and tallying

In marcusbuckmann/ffcr: Train two families of transparent classification models: fast-and-frugal trees and tallying

Introduction

Learning fast-and-frugal trees

Training a fast-and-frugal tree

Learning tallying models

References

R Package Documentation

Browse R Packages

We want your feedback!

marcusbuckmann/ffcr
Train two families of transparent classification models: fast-and-frugal trees and tallying