Tutorial: Creating FFTs for heart disease

knitr::opts_chunk$set(collapse = FALSE, 
                      comment = "#>", 
                      prompt = FALSE,
                      tidy = FALSE,
                      echo = TRUE, 
                      message = FALSE,
                      warning = FALSE,
                      # Default figure options:
                      dpi = 100, 
                      fig.align = 'center', 
                      fig.height = 6.0, 
                      fig.width  = 6.5, 
                      out.width = "580px")
library(FFTrees)

Tutorial: Creating FFTs for heart disease

This tutorial on using the FFTrees package follows the examples presented in @phillips2017FFTrees (freely available in html | PDF):

In the following, we explain how to use FFTrees to create, evaluate and visualize FFTs in four simple steps.

Step\ 1: Install and load the FFTrees package

We can install FFTrees from CRAN using install.packages(). (We only need to do this once.)

# Install the package from CRAN:
install.packages("FFTrees")

To use the package, we first need to load it into your current R session. We load the package using library():

# Load the package:
library(FFTrees)

The FFTrees package contains several vignettes that guide through the package's functionality (like this one). To open the main guide, run FFTrees.guide():

# Open the main package guide: 
FFTrees.guide()

Step\ 2: Create FFTs from training data (and test on testing data)

In this example, we will create FFTs from a heart disease data set. The training data are in an object called heart.train, and the testing data are in an object called heart.test. For these data, we will predict diagnosis, a binary criterion that indicates whether each patient has or does not have heart disease (i.e., is at high-risk or low-risk).

To create an FFTrees object, we use the function FFTrees() with two main arguments:

  1. formula expects a formula indicating a binary criterion variable as a function of one or more predictor variable(s) to be considered for the tree. The shorthand formula = diagnosis ~ . means to include all predictor variables.

  2. data specifies the training data used to construct the FFTs (which must include the criterion variable).

Here is how we can construct our first FFTs:

# Create an FFTrees object:
heart.fft <- FFTrees(formula = diagnosis ~ .,           # Criterion and (all) predictors
                     data = heart.train,                # Training data
                     data.test = heart.test,            # Testing data
                     main = "Heart Disease",            # General label
                     decision.labels = c("Low-Risk", "High-Risk")  # Decision labels (False/True)
                     )

Evaluating this expression runs code that examines the data, optimizes thresholds based on our current goals for each cue, and creates and evaluates r heart.fft$trees$n\ FFTs. The resulting FFTrees object that contains the tree definitions, their decisions, and their performance statistics, are assigned to the heart.fft\ object.

Other arguments

The following arguments apply when using the "ifan" or "dfan" algorithms for creating new FFTs:

Step\ 3: Inspect and summarize FFTs

Now we can inspect and summarize the generated decision trees. We will start by printing the FFTrees object to return basic information to the console:

# Print an FFTrees object:
heart.fft

The output tells us several pieces of information:

All statistics to evaluate each tree can be derived from a 2\ x\ 2 confusion table:

knitr::include_graphics("../inst/confusiontable.jpg")

For definitions of all accuracy statistics, see the accuracy statistics vignette.

Step\ 4: Visualise the final FFT

We use plot(x) to visualize an FFT (from an\ FFTrees object\ x). Using data = "train" evaluates an\ FFT for training data (fitting), whereas data = "test" predicts the performance of an\ FFT for a different dataset:

# Plot predictions of the best FFT when applied to test data:
plot(heart.fft,      # An FFTrees object
     data = "test")  # data to use (i.e., either "train" or "test")?

Other arguments

The plot() function for FFTrees object

# Plot only the tree, without accuracy statistics:
plot(heart.fft, what = "tree")
# plot(heart.fft, stats = FALSE)  #  The 'stats' argument has been deprecated.
# Plot cue accuracies (for training data) in ROC space:
plot(heart.fft, what = "cues")

See the Plotting FFTrees vignette for details on plotting FFTs.

Advanced functions

Creating sets of FFTs and evaluating them on data by printing and plotting individual FFTs provides the core functionality of FFTrees. However, the package also provides more advanced functions for accessing, defining, using and evaluating FFTs.

Accessing outputs

An FFTrees object contains many different outputs. Basic performance information on the current data and set of FFTs is available by the summary() function. To see and access parts of an FFTrees object, use str() or names():

# Show the names of all outputs in heart.fft:
names(heart.fft)

Key elements of an FFTrees object are explained in the vignette on Creating FFTs with FFTrees().

Predicting for new data

To predict classification outcomes for new data, use the standard predict() function. For example, here's how to predict the classifications for data in the heartdisease object (which actually is just a combination of heart.train and heart.test):

# Predict classifications for a new dataset:
predict(heart.fft, 
        newdata = heartdisease)

Directly defining FFTs

To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the my.tree argument. Similarly, we can define sets of FFT definitions (as a data frame) and evaluate them on data by using the tree.definitions argument of FFTrees(). As we often start from an existing set of FFTs, FFTrees provides a set of functions for extracting, converting, and modifying tree definitions.

See the vignette on Manually specifying FFTs for defining FFTs from descriptions and modifying tree definitions.

Vignettes

Here is a complete list of the vignettes available in the FFTrees package:

| | Vignette | Description | |--:|:------------------------------|:-------------------------------------------------| | | Main guide: FFTrees overview | An overview of the FFTrees package | | 1 | Tutorial: FFTs for heart disease | An example of using FFTrees() to model heart disease diagnosis | | 2 | Accuracy statistics | Definitions of accuracy statistics used throughout the package | | 3 | Creating FFTs with FFTrees() | Details on the main FFTrees() function | | 4 | Manually specifying FFTs | How to directly create FFTs without using the built-in algorithms | | 5 | Visualizing FFTs | Plotting FFTrees objects, from full trees to icon arrays | | 6 | Examples of FFTs | Examples of FFTs from different datasets contained in the package |

References



Try the FFTrees package in your browser

Any scripts or data that you put into this service are public.

FFTrees documentation built on June 7, 2023, 5:56 p.m.