Manually specifying FFTs

We typically create fast-and-frugal trees (FFTs) from data by using the FFTrees() function (see the Main guide, the vignette on Creating FFTs with FFTrees() and for details). However, we sometimes want to design and test some specific FFT (e.g., to check a hypothesis or using some variables based on theoretical considerations).

There are two ways to define fast-and-frugal trees manually when using the FFTrees() function:

as a sentence using the my.tree argument (the easier way), or
as a data frame using the tree.definitions argument (the harder way).

Both of these methods will bypass the tree construction algorithms built into the FFTrees package.

Using `my.tree`

The first method is to use the my.tree argument, where my.tree is a sentence describing a (single) FFT. When this argument is specified in FFTrees(), the function (specifically, an auxiliary fftrees_wordstofftrees() function) will try to convert the verbal description into the definition of a FFT (of an FFTrees object).

For example, let’s look at the heartdisease data to find out how some predictor variables (e.g., sex, age, etc.) predict the criterion variable (diagnosis):

**Table 1**: Five cues and the binary criterion variable `diagnosis` for the first cases of the `heartdisease` data.
sex	age	thal	cp	ca	diagnosis
1	63	fd	ta	0	FALSE
1	67	normal	a	3	TRUE
1	67	rd	a	2	TRUE
1	37	normal	np	0	FALSE
0	41	normal	aa	0	FALSE
1	56	normal	aa	0	FALSE

Here’s how we could verbally describe an FFT by using the first three cues in conditional sentences:

in_words <- "If sex = 1, predict True.
             If age < 45, predict False. 
             If thal = {fd, normal}, predict True. 
             Otherwise, predict False."

As we will see shortly, the FFTrees() function accepts such descriptions (assigned here to a character string in_words) as its my.tree argument, create a corresponding FFT, and evaluate it on a corresponding dataset.

How to define FFTs

Here are some instructions for manually specifying trees:

Each node must start with the word “If” and should correspond to the form: If <CUE> <DIRECTION> <THRESHOLD>, predict <EXIT>.
Numeric thresholds should be specified directly (without brackets), like age > 21.
For categorical variables, factor thresholds must be specified within curly braces, like sex = {male}. For factors with sets of values, categories within a threshold should be separated by commas like eyecolor = {blue,brown}.
To specify cue directions, standard logical comparisons =, !=, <, >= (etc.) are valid. For numeric cues, only use >, >=, <, or <=. For factors, only use = or !=.
Positive exits are indicated by True, while negative exits are specified by False.
The final node of an FFT is always bi-directional (i.e., has both a positive and a negative exit). The description of the final node always mentions its positive (True) exit first. The text Otherwise, predict EXIT that we have included in the example above is actually not necessary (and ignored).

Example

Now, let’s use our verbal description of an FFT (assigned to in_words above) as the my.tree argument of the FFTrees() function. This creates a corresponding FFT and applies it to the heartdisease data:

# Create FFTrees from a verbal FFT description (as my.tree): 
my_fft <- FFTrees(diagnosis ~.,
                  data = heartdisease,
                  main = "My 1st FFT", 
                  my.tree = in_words)

Let’s see how well our manually constructed FFT (my_fft) did:

# Inspect FFTrees object:
plot(my_fft)

Figure 1: An FFT manually constructed using the my.tree argument of FFTrees().

When manually constructing a tree, the resulting FFTrees object only contains a single FFT. Hence, the ROC plot (in the right bottom panel of Figure 1) cannot show a range of FFTs, but locates the constructed FFT in ROC space.

As it turns out, the performance of our first FFT created from a verbal description is a mixed affair: The tree has a rather high sensitivity (of 91%), but its low specificity (of only 10%) allows for many false alarms. Consequently, its accuracy measures are only around baseline level.

Creating an alternative FFT

Let’s see if we can come up with a better FFT. The following example uses the cues thal, cp, and ca in the my.tree argument:

# Create 2nd FFTrees from an alternative FFT description (as my.tree): 
my_fft_2 <- FFTrees(diagnosis ~.,
                    data = heartdisease, 
                    main = "My 2nd FFT", 
                    my.tree = "If thal = {rd,fd}, predict True.
                               If cp != {a}, predict False. 
                               If ca > 1, predict True. 
                               Otherwise, predict False.")

# Inspect FFTrees object:
plot(my_fft_2)

Figure 2: Another FFT manually constructed using the my.tree argument of FFTrees().

This alternative FFT is nicely balancing sensitivity and specificity and performs much better overall. Nevertheless, it is still far from perfect — so check out whether you can create even better ones!

For details on understanding and changing tree definitions, see the section on Tree definitions in the Creating FFTs with FFTrees() vignette.

	Vignette	Description
	Main guide	An overview of the FFTrees package
1	Heart Disease Tutorial	An example of using `FFTrees()` to model heart disease diagnosis
2	Accuracy statistics	Definitions of accuracy statistics used throughout the package
3	Creating FFTs with FFTrees()	Details on the main function `FFTrees()`
4	Specifying FFTs directly	How to directly create FFTs with `my.tree` without using the built-in algorithms
5	Visualizing FFTs with plot()	Plotting `FFTrees` objects, from full trees to icon arrays
6	Examples of FFTs	Examples of FFTs from different datasets contained in the package

Manually specifying FFTs

Nathaniel Phillips and Hansjörg Neth

2022-08-31