knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This document gives an introduction on the functionality of the ffcr package. This package allows the construction of two families of transparent classification models: fast-and-frugal trees and tallying models. A fast-and-frugal tree is a decision tree with a simple structure: one branch of each node exits tree, the other continues to the next node until the final node is reached. A tallying model gives pieces of evidence the same weight. The package contains two main functions: fftree
to train fast-and-frugal trees and tally
to train tallying models.
To illustrate the functionality of the package, we use the Liver data set [@ramana2011] that we obtained form the UCI machine learning repository [@dua2017]. It contains 579 patients of which 414 have a liver the condition and the other 165 do not. We predict which patient has a liver condition using medical measures and the age and gender of the people.
We start with loading the package and the data.
# library(ffcr) devtools::load_all(".") data(liver)
The fftr
functions encompasses three different methods to train fast-and-frugal trees. These are named basic, greedy, and best-fit^[We use the cross-entropy method [@rubinstein1999]. It does not guarantee to find the best possible tree but produces very accurate trees, on average.] and are described in the book.
We train or first fast-and-frugal tree on the Liver data set. If the first column in the data set is the class label, we can simply pass the data set as the first argument.
model <- fftree(liver, use_features_once = FALSE, method = "greedy", max_depth = 6)
Alternatively, we can call the function using the formula syntax. Here we train the fast-and-frugal tree using only only a few selected features.
fftree(diagnosis ~ sex + age + albumin + proteins + aspartate , data = liver)
The model object shows the structure of the trees and its performance on the data set.
print(model)
To visualize the tree we use
plot(model)
How does the fast-and-frugal tree perform in cross-validation? By default the model is fitted to the complete data set but if we set 'cv = TRUE', 10-fold cross-validation is used to estimate the predictive performance of the tree. The model saved in the object is fitted on the complete training set. In fitting and prediction, the sensitivity is very high, while the specificity is low. The majority of the patients (71%) have liver disease, therefore predicting liver disease for most objects will produce a highly accurate tree. To avoid that, we can weigh the objects such that the objects in both classes get the same share. Let p be the proportion of patients that have liver disease. We weigh the patients with liver disease by 1-p, and the patients without disease by p.
Note how sensitivity and specificity are more similar now:
p <- sum(liver$diagnosis == "Liver disease")/nrow(liver) model <- fftree(liver, weights = c(1-p,p), cv = TRUE) model
To make predictions according to a fast-and-frugal tree, we can use the predict
function. It returns either the class label (response), the predicted probability of belonging to one of the classes (probability) or the performance across the observations (metric). Note that for the latter, the class labels need to be included in the data that is passed to the predict function.
model <- fftree(diagnosis ~ ., data = liver[1:300,], weights = c(1-p,p)) predict(model, newdata = liver[301:310,], type = "response") predict(model, newdata = liver[301:310,], type = "probability") predict(model, newdata = liver[301:nrow(liver),], type = "metric")
The package implements two different methods to train tallying models, which are also explained in the book. These are named basic and best-fit.^[We again use the cross-entropy method.]
p <- sum(liver$diagnosis == "Liver disease")/nrow(liver) model <- tally(diagnosis ~ ., data = liver[1:300,], weights = c(1-p,p), max_size = 6) predict(model, newdata = liver[301:nrow(liver),], type = "metric")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.