Introduction to aorsf

This article covers core features of the aorsf package.

Background: ORSF

The oblique random survival forest (ORSF) is an extension of the axis-based RSF algorithm.

Accelerated ORSF

The purpose of aorsf (‘a’ is short for accelerated) is to provide routines to fit ORSFs that will scale adequately to large data sets. The fastest algorithm available in the package is the accelerated ORSF model, which is the default method used by orsf():


library(aorsf)

set.seed(329)

orsf_fit <- orsf(data = pbc_orsf, 
                 formula = Surv(time, status) ~ . - id)

orsf_fit
#> ---------- Oblique random survival forest
#> 
#>      Linear combinations: Accelerated
#>           N observations: 276
#>                 N events: 111
#>                  N trees: 500
#>       N predictors total: 17
#>    N predictors per node: 5
#>  Average leaves per tree: 24
#> Min observations in leaf: 5
#>       Min events in leaf: 1
#>           OOB stat value: 0.84
#>            OOB stat type: Harrell's C-statistic
#>      Variable importance: anova
#> 
#> -----------------------------------------

you may notice that the first input of aorsf is data. This is a design choice that makes it easier to use orsf with pipes (i.e., %>% or |>). For instance,

library(dplyr)

orsf_fit <- pbc_orsf |> 
 select(-id) |> 
 orsf(formula = Surv(time, status) ~ .)

Interpretation

aorsf includes several functions dedicated to interpretation of ORSFs, both through estimation of partial dependence and variable importance.

Variable importance

aorsf provides multiple ways to compute variable importance.

Partial dependence (PD)

Partial dependence (PD) shows the expected prediction from a model as a function of a single predictor or multiple predictors. The expectation is marginalized over the values of all other predictors, giving something like a multivariable adjusted estimate of the model’s prediction.

For more on PD, see the vignette

Individual conditional expectations (ICE)

Unlike partial dependence, which shows the expected prediction as a function of one or multiple predictors, individual conditional expectations (ICE) show the prediction for an individual observation as a function of a predictor.

For more on ICE, see the vignette

What about the original ORSF?

The original ORSF (i.e., obliqueRSF) used glmnet to find linear combinations of inputs. aorsf allows users to implement this approach using the orsf_control_net() function:


orsf_net <- orsf(data = pbc_orsf, 
                 formula = Surv(time, status) ~ . - id, 
                 control = orsf_control_net(),
                 n_tree = 50)

net forests fit a lot faster than the original ORSF function in obliqueRSF. However, net forests are still much slower than cph ones:


# tracking how long it takes to fit 50 glmnet trees
print(
 t1 <- system.time(
  orsf(data = pbc_orsf, 
       formula = Surv(time, status) ~ . - id, 
       control = orsf_control_net(),
       n_tree = 50)
 )
)
#>    user  system elapsed 
#>    2.45    0.00    2.46

# and how long it takes to fit 50 cph trees
print(
 t2 <- system.time(
  orsf(data = pbc_orsf, 
       formula = Surv(time, status) ~ . - id, 
       control = orsf_control_cph(),
       n_tree = 50)
 )
)
#>    user  system elapsed 
#>    0.05    0.00    0.05

t1['elapsed'] / t2['elapsed']
#> elapsed 
#>    49.2

aorsf and other machine learning software

The unique feature of aorsf is its fast algorithms to fit ORSF ensembles. RLT and obliqueRSF both fit oblique random survival forests, but aorsf does so faster. ranger and randomForestSRC fit survival forests, but neither package supports oblique splitting. obliqueRF fits oblique random forests for classification and regression, but not survival. PPforest fits oblique random forests for classification but not survival.

Note: The default prediction behavior for aorsf models is to produce predicted risk at a specific prediction horizon, which is not the default for ranger or randomForestSRC. I think this will change in the future, as computing time independent predictions with aorsf could be helpful.