The following example follows the tutorial presented in Phillips, Neth, Woike, & Gaissmaier (2017) (freely available in html | PDF):
This tutorial explains how to use the FFTrees package to create, evaluate and visualize FFTs in four simple steps.
We can install FFTrees from CRAN using install.packages()
. (We only need to do this once.)
# Install the package from CRAN:
install.packages("FFTrees")
To use the package, we first need to load it into your current R session. We load the package using library()
:
# Load the package:
library(FFTrees)
The FFTrees package contains several vignettes that guide through the package’s functionality (like this one). To open the main guide, run FFTrees.guide()
:
# Open the main package guide:
FFTrees.guide()
In this example, we will create FFTs from a heart disease data set. The training data are in an object called heart.train
, and the testing data are in an object called heart.test
. For these data, we will predict diagnosis
, a binary criterion that indicates whether each patent has or does not have heart disease (i.e., is at high-risk or low-risk).
To create an FFTrees
object, we use the function FFTrees()
with two main arguments:
formula
expects a formula indicating a binary criterion variable as a function of one or more predictor variable(s) to be considered for the tree. The shorthand formula = diagnosis ~ .
means to include all predictor variables.
data
specifies the training data used to construct the FFTs (which must include the criterion variable).
Here is how we can construct our first FFTs:
# Create an FFTrees object:
FFTrees(formula = diagnosis ~ ., # Criterion and (all) predictors
heart.fft <-data = heart.train, # Training data
data.test = heart.test, # Testing data
main = "Heart Disease", # General label
decision.labels = c("Low-Risk", "High-Risk") # Decision labels (False/True)
)
The resulting trees, decisions, and accuracy statistics are now stored in an FFTrees
object called heart.fft
.
algorithm
: There are two different algorithms available to build FFTs "ifan"
(Phillips et al., 2017) and "dfan"
(Phillips et al., 2017). ("max"
(Martignon, Katsikopoulos, & Woike, 2008), and "zigzag"
(Martignon et al., 2008) are no longer supported).
max.levels
: Changes the maximum number of levels allowed in the tree.
The following arguments apply to the “ifan” and “dfan” algorithms only:
goal.chase
: The goal.chase
argument changes which statistic is maximized during tree construction (for the "ifan"
and "dfan"
algorithms). Possible arguments include "acc"
, "bacc"
, "wacc"
, "dprime"
, and "cost"
. The default is "wacc"
with a sensitivity weight of 0.50 (which renders it identical to "bacc"
).
goal
: The goal
argument changes which statistic is maximized when selecting trees after construction (for the "ifan"
and "dfan"
algorithms). Possible arguments include "acc"
, "bacc"
, "wacc"
, "dprime"
, and "cost"
.
my.tree
: We can define a tree verbally as a sentence using the my.tree
argument. See the Defining an FFT verbally vignette for details.
Now we can inspect and summarize the generated decision trees. We will start by printing the FFTrees
object to return basic information to the console:
# Print an FFTrees object:
heart.fft
## Heart Disease
## FFTrees
## - Trees: 7 fast-and-frugal trees predicting diagnosis
## - Outcome costs: [hi = 0, mi = 1, fa = 1, cr = 0]
##
## FFT #1: Definition
## [1] If thal = {rd,fd}, decide High-Risk.
## [2] If cp != {a}, decide Low-Risk.
## [3] If ca > 0, decide High-Risk, otherwise, decide Low-Risk.
##
## FFT #1: Training Accuracy
## Training data: N = 150, Pos (+) = 66 (44%)
##
## | | True + | True - | Totals:
## |----------|--------|--------|
## | Decide + | hi 54 | fa 18 | 72
## | Decide - | mi 12 | cr 66 | 78
## |----------|--------|--------|
## Totals: 66 84 N = 150
##
## acc = 80.0% ppv = 75.0% npv = 84.6%
## bacc = 80.2% sens = 81.8% spec = 78.6%
##
## FFT #1: Training Speed, Frugality, and Cost
## mcu = 1.74, pci = 0.87, E(cost) = 0.200
The output tells us several pieces of information:
The tree with the highest weighted sensitivity wacc
with a sensitivity weight of 0.5 is selected as the best tree.
Here, the best tree, FFT #1 uses three cues: thal
, cp
, and ca
.
Several summary statistics for this tree in training and test data are summarized.
All statistics to evaluate each tree can be derived from a 2 x 2 confusion table:
For definitions of all accuracy statistics, see the accuracy statistics vignette.
Use plot()
to visualize an FFT (an FFTrees
object):
# Plot the best FFT when applied to test data:
plot(heart.fft, # an FFTrees object
data = "test") # data to plot ("train" or "test")?
tree
: Which tree in the object should beplotted? To plot a tree other than the best fitting tree (FFT #1), just specify another tree as an integer (e.g.; plot(heart.fft, tree = 2)
).
data
: For which dataset should statistics be shown? Either data = "train"
(showing fitting or “Training” performance by default), or data = "test"
(showing prediction or “Testing” performance).
stats
: Should accuracy statistics be shown with the tree? To show only the tree, without any performance statistics, include the argument stats = FALSE
.
# Plot only the tree, without accuracy statistics:
plot(heart.fft,
stats = FALSE)
comp
: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument comp = FALSE
.
what
: To show individual cue accuracies (in ROC space), include the argument what = "cues"
:
# Plot cue accuracies for training data (in ROC space):
plot(heart.fft,
data = "train",
what = "cues")
See the Plotting FFTrees vignette for details on plotting FFTs.
An FFTrees
object contains many different outputs, to see them all, run names()
# Show the names of all of the outputs in heart.fft:
names(heart.fft)
## [1] "criterion_name" "cue_names" "formula" "trees"
## [5] "data" "params" "competition" "cues"
To predict classifications for a new dataset, use the standard predict()
function. For example, here’s how to predict the classifications for data in the heartdisease
object (which actually is just a combination of heart.train
and heart.test
):
# Predict classifications for a new dataset:
predict(heart.fft,
data = heartdisease)
To define a specific FFT and apply this tree to data, we can define it by providing a verbal description to the my.tree
argument:
# Create an FFT manually (from description):
FFTrees(formula = diagnosis ~.,
my.heart.fft <-data = heart.train,
data.test = heart.test,
main = "Custom Heart FFT",
my.tree = "If chol > 350, predict True.
If cp != {a}, predict False.
If age <= 35, predict False, otherwise, predict True.")
Here is the resulting tree:
plot(my.heart.fft)
It’s actually not too bad, although the first node is pretty worthless (as it only classifies 3 cases, all as false alarms).
See the Defining an FFT verbally vignette for details on defining FFTs from verbal descriptions.
Here is a complete list of the vignettes available in the FFTrees package:
Vignette | Description | |
---|---|---|
Main guide | An overview of the FFTrees package | |
1 | Heart Disease Tutorial | An example of using FFTrees() to model heart disease diagnosis |
2 | Accuracy statistics | Definitions of accuracy statistics used throughout the package |
3 | Creating FFTs with FFTrees() | Details on the main function FFTrees() |
4 | Specifying FFTs directly | How to directly create FFTs with my.tree without using the built-in algorithms |
5 | Visualizing FFTs with plot() | Plotting FFTrees objects, from full trees to icon arrays |
6 | Examples of FFTs | Examples of FFTs from different datasets contained in the package |
Martignon, L., Katsikopoulos, K. V., & Woike, J. K. (2008). Categorization with limited resources: A family of simple heuristics. Journal of Mathematical Psychology, 52(6), 352–361. https://doi.org/10.1016/j.jmp.2008.04.003
Phillips, N. D., Neth, H., Woike, J. K., & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgment and Decision Making, 12(4), 344–368. Retrieved from https://journal.sjdm.org/17/17217/jdm17217.pdf