Planning Time-To-Event Trials with gestate

James Bell

2021-11-24

1 Introduction

1.1 Vignette Overview

The gestate package is designed to assist accurate planning and event prediction for time-to-event clinical trials. In this vignette, we introduce the gestate package, and demonstrate how it may be used in trial planning. A separate vignette (“event_prediction”) covers event prediction of ongoing trials. The first section will show how distributions are easily described and stored as Curve and RCurve objects. The second section will demonstrate analytic methods for trial planning, and the third section will show how simulate may be alternatively used.

1.2 Exponential Distributions

The least possible information required to calculate a time-to-event trial’s predicted properties are distributions of events in each arm and an expected distribution of patient recruitment, including the number per arm. However, typically event distributions are over-simplified by implicitly assuming exponential distributions and then defining them by a single number (often the median). This shortcut is not available in gestate since events in many clinical trials do not follow exponential distributions and assuming this without evidence can lead to inaccurate trial planning. Even where no historical information is available and an exponential distribution can be considered to be the best guess, it is important to be clear that a choice has been made in setting a distribution to exponential. Gestate therefore requires explicit selection of distribution types, as well as their parameters, and does not default to exponential.

1.3 Non-Proportional Hazards and Complex Censoring

Aside from simplicity, one of the historical reasons for use of the exponential distribution was a requirement for proportional hazards in many analysis methods, although this should not have prevented use of Weibull distributions. With the rising importance of non-proportional hazards in clinical trials, it is important that trial planning methods can handle both the expected patterns of events and also analytical approaches that are more suited to this situation. To date, this has mainly been done by simulation, with the standard drawbacks of having to account for Monte Carlo error and being relatively slow. Gestate’s numerical integration methods and Curve architecture for handling distributions are designed to handle unusual event distributions as well as complex censoring patterns, and to produce calculations for a range of analytical methods. These approaches are fast in comparison to simulation, and generally quite accurate. Simulation tools have also been provided to assist in validating output, but it is intended that the user arrives at their preferred sample size and then validates it by simulation - this workflow is designed to produce a considerable time-saving.

1.4 Power Calculation vs Sample Size Calculation

The gestate package is designed around calculating power from a given sample size rather than sample size from power. While this is clearly necessary for simulation, the analytic approach also does this for the following reasons.

First, it allows power of a trial to be assessed over time. This is important as there are usually questions around the trade-off between trial duration and sample size. Even where it is reasonable to reversibly relate event numbers to power, the most pertinent questions then relate to how many patients are required to produce those events and how long it will take.

Secondly, moderate-to-large changes of sample size usually require changing the length or shape of recruitment, rather than simply scaling the rate. However, the length and shape of recruitment are also important for calculating the number of events at any given time. Methods that calculate sample size from power generally assume that the rate of recruitment can be scaled freely; the alternative of specifying an algorithm to change other recruitment parameters as patient numbers increase cannot be done generally as trial logistics vary widely. To speed sample size identification up, gestate provides an option to estimate a sample size that would produce a specified power at each time point, using the simplified rate scaling assumption. This provides an indication of patient numbers for the next scenario to calculate power for, but allows the user opportunity to change other recruitment parameters appropriately.

A final complicating factor for direct sample size calculation is that the correspondence between event numbers and power is not absolute, even for log-rank tests. Particularly when randomisation ratios are not 1:1, it can be shown that the ‘maturity’ of the data (and in particular the ratio of events between arms) affects the number of events required for a given power. This is highlighted by the inclusion of the Frontier Power method for power calculation that is included in the package,which is a generalisation of the Schoenfeld formula to handle unequal randomisation ratios appropriately. Since data maturity affects power, even simply scaling the sample size to achieve the desired number of events may not actually give the required power1.

1.5 Shiny App

The gestate package contains a built-in Shiny App to implement most of the functionality covered in this vignette.

It may be run by the command ‘run_gestate()’.

The app contains real-time interactive plots for the currently specified Curves and RCurve. Updating them will also update the plots. This can be useful for checking parameterisation. Most functionality is covered, and both analytic and simulation approaches are included. Most parameters and arguments are shared between the two approaches, so once one is run, it is trivial to also run the other. All plots and tables created are downloadable.

In general, it is recommended to use the app for interactive, low-throughput sessions, perhaps when there is uncertainty about parameters. It also makes a good starting point for new users of gestate. Use of gestate directly in R is recommended for higher throughput workflows, systematic searching of parameters, or where outputs are fed into other programs.

2 Curves

2.1 Event and Censoring Distribution Description and Storage: Curve Objects

The gestate package is built upon Curve objects, named because they describe a Survival curve. These contain all necessary information about probability distributions that may be useful for planning time-to-event trials, including both the parameters and the names of functions governing its behaviour.

Curve objects serve three main purposes; they allow the gestate code to be independent of the distributions it runs, allow the free interchange and combination of different distributions, and provide a convenient format for holding all relevant information about a distribution.

So long as a distribution is supported, a user may create a Curve object by using a (constructor) function corresponding to the distribution of interest, providing its parameters as arguments.

gestate V1.4.x comes with the following distributions supported as Curve objects:

Distribution Function Parameter Arguments S(t)
Exponential Exponential lambda \(e^{- \lambda t}\)
Weibull Weibull alpha, beta \(e^{-(t/\alpha)^\beta)}\)
Log-Normal Lognormal mu, sigma \(0.5(1+erf\left(\frac{\log(t)-\mu }{\sigma\sqrt2}\right))\)
Log-Logistic LogLogistic theta, eta \(\frac{t^{\eta}}{\theta^{\eta}+t^{\eta}}\)
Gompertz Gompertz theta, eta \(e^{\eta - \eta e^{\theta t}}\)
Generalised Gamma GGamma theta, eta, rho \(\frac{\Gamma(\eta,(t/\theta)^\rho)}{\Gamma(\eta)}\)
Piecewise-Exponential PieceExponential start, lambda \(\prod_{x = 1}^{len(\mathbf{\lambda})} e^{- \lambda_x t_x}\) where \(t_x = \min{(start_{x+1},(\max{(0,t-start_x)}))}\) \(start_{x+1} = Inf\) if otherwise undefined.
Mixture-Exponential MixExp props, lambdas \(p_{1}e^{- \lambda_{1} t} + p_{2}e^{- \lambda_{2} t} + ...\)
Mixture-Weibull MixWei props, alphas, betas \(p_{1}e^{-(t/\alpha_{1})^{\beta_{1})}} + p_{1}e^{-(t/\alpha_{2})^{\beta_{2})}}+...\)
Null Blank \(1\)

Event distributions and dropout-censoring distributions must be provided to gestate in Curve format. An example of defining a Curve is as follows:

Where you want to make an event / censoring impossible, you can use define a null Curve by the function Blank(). Typically, this is the starting point for calculations (the ‘practical default’), and is useful where no dropout censoring is required. The null curve is also the default within gestate for the dropout censoring distribution.

2.2 Recruitment distribution description and storage: RCurve

Recruitment-associated distributions use a special form of Curve object; the RCurve. This differs in several ways. Firstly, it also includes details of patient numbers for two arms; an active arm and a control arm. Secondly, it stores the distribution of patient recruitment, rather than a censoring distribution. This is because the ‘censoring distribution’ can only be defined in conjunction with an assessment time.

gestate V1.4.x comes with the following recruitment distributions supported as RCurve objects:

Distribution Function Parameter Arguments Notes
Linear LinearR rlength, Nactive, Ncontrol Standard linear recruitment
Instant InstantR Nactive, Ncontrol All patients in trial for same duration
Piecewise-linear PieceR recruitment, ratio Recruitment is two-column matrix of durations and rates
Piecewise-linear, fixed follow-up PieceRMaxF recruitment, ratio, maxF Piecewise recruitment with patients followed for fixed duration

For most event-driven TTE trials, LinearR or PieceR distributions are appropriate. For trials where patients are instead followed for a fixed duration (e.g. 12 months per patient), the InstantR distribution is typically sufficient, since the trial typically finishes once all patients have been followed for the required time. However, in cases where the end of trial is both event driven and has fixed duration patient follow-up, the PieceRMaxF distribution accounts for this additional complexity.

3 Analytic Planning of Time-to-Event Trials: nph_traj

Common questions in planning time to event trials include what the expected number of events, survival function, power or width of hazard ratio (HR) confidence interval will be at any given time. In non-proportion hazards situations, the expected HR, difference in Survival functions (referred to as landmark analysis) or difference in restricted mean survival time (RMST) at a given assessment time (including power to test them) may also be of interest. nph_traj is designed to analytically answer these questions and to do so in real-time.

nph_traj uses numerical integration techniques to calculate expected event numbers in each arm at each time point and from these derives predicted properties for the log-rank test and Cox regression using an approach based upon the Pike approximation for the HR2. Separate numerical integration approaches are also used to predict standard errors for RMST and landmark analysis in the presence of censoring and hence calculate power for these analyses.

Due to the interchangeability of Curve objects and the flexibility afforded by numerical integration, the nph_traj function can therefore be used to analytically calculate expected properties over time of a time-to-event trial, even under non-proportional hazards or with complex censoring distributions and hence also derive power trajectories for multiple common analysis methods.

nph_traj makes no explicit assumptions about the unit of time, only that all inputs have the same unit. In practice however, it is usually most convenient for clinical trial planning to use units of months. nph_traj creates trajectories with one entry at each integer time, so using units of years will typically lead to overly-granular trajectories. Conversely, using e.g. units of days or weeks is typically impractical (as recruitment data is typically summarised by month), slow (due to the increased number of calculations), and over-precise (time lags and uncertainties typically ensure that timings to the nearest month are of most relevance).

3.1 Getting started: Simple Log-Rank and Hazard-Ratio Calculations

At a minimum, nph_traj requires a Curve object to describe the distribution of each of the active and control arms (‘active_ecurve’ and ‘control_ecurve’ arguments respectively), as well as an RCurve object (‘rcurve’ argument) to describe the recruitment distribution. For more information on Curve and RCurve objects, see section 2. The distributions of the active and control arms should typically be estimated based upon parameters or curve-fitting from previous trials (using e.g. fit_KM or fit_tte_data from the gestate package; see the event_prediction vignette).

Example call:

3.2 Interpreting Output

The output from nph_traj is a list, with the first few entries corresponding to the input Curve and RCurve objects to provide context to the results. The results themselves are found under $Summary and are in the form of a trajectory, providing information about the trial at each integer time point up to the maximum assessment time (‘max_assessment’ argument).

Each row corresponds to a separate time point, given by the first column;‘Time’. The second column, ‘Patients’ gives the expected number of patients entered into the trial by that time. The three ‘Events’ columns give the expected numbers of events in each arm and overall at each time.The next three columns provide the Hazard Ratio, Log-Hazard Ratio and SE of the Log-Hazard Ratio. Finally, the power for the log-rank test / Cox regression are given based upon the Schoenfeld3,4 and Frontier1 methods.

3.3 Censoring

Administrative censoring is handled implicitly based upon the specified RCurve object. However, dropout censoring is also common in time-to-event trials. This can be specified through Curve objects, and separate censoring distributions are supported for each arm. By default, gestate uses no censoring via the Blank() Curve.

It should be noted that the required censoring distributions must represent that of the censoring that would occur in the absence of events or administrative censoring. The observed numbers of censored patients in each arm will therefore also depend partially on the event distributions. The suggested way to use existing data to parameterise the censoring distributions is to create ‘reverse Kaplan Meier’ plots of existing data, whereby dropout censorings are ‘events’, and both events and administrative censorings are ‘censorings’. Curve-fitting techniques (e.g. fit_KM or fit_tte_data from gestate, see event_prediction vignette) may then be used to estimate the parametric distributions.

3.4 Other/Advanced Options

To control the one-sided type I error for power calculations, the ‘alpha1’ argument may be used. This is 0.025 by default.

An additional feature of nph_traj is it can estimate the required number of patients to achieve a given power at a given time point (using Schoenfeld), assuming all parameters remain constant (only the rate of recruitment changes to allow patient number to change in the existing specified time frame). This can be requested by setting the ‘required_power’ argument to the required decimal, e.g. 0.9 for 90% power. For sample size calculations it is strongly recommended to only use this as a guide of the next set of parameters to try, and to rerun the calculation. This is because unless the number is similar to that in the existing RCurve, it will typically require changing the recruitment length to reach the new number, and that will result in changes to the calculation.

It is also possible to get more detailed outputs by specifying ‘detailed_output=TRUE’. This includes several additional quantities required by the calculations or derived from them. Full details may be found in the documentation of the nph_traj function (via e.g. ?nph_traj).

Version 1.4.1 onwards supports non-inferiority trials for power calculations based on hazard ratio (only). The optional ‘HRbound’ argument can be used to specify the HR to be used as the non-inferiority bound. By default this is 1, i.e. a superiority trial. Values greater than 1 should be specified for non-inferiority settings. Values less than 1 could be used for trials looking to show evidence that they are at least a certain amount better than the comparator. Note that any requested power calculations based on a log rank Z test, or for RMST/landmark analyses will still be on the basis of superiority.

# Define exponential distributions for active and control arms, with HR of 0.7
censorCurveA <- Exponential(lambda=0.001)
censorCurveC <- Exponential(lambda=0.002)

# Run nph_traj with default settings to calculate expected properties up to 10 months in a NI setting against a HR bound of 1.3.
output1 <- nph_traj(active_ecurve=activeCurve,control_ecurve=controlCurve,active_dcurve=censorCurveA,control_dcurve=censorCurveC,rcurve=rcurve3,max_assessment=10,HRbound=1.3,detailed_output = TRUE,required_power = 0.9)
# Display output
output1
#> $active_ecurve
#> Curve Object
#> Type: Exponential 
#> Distribution Parameters:
#>    rate :  0.07 
#> 
#> $control_ecurve
#> Curve Object
#> Type: Exponential 
#> Distribution Parameters:
#>    rate :  0.1 
#> 
#> $active_dcurve
#> Curve Object
#> Type: Exponential 
#> Distribution Parameters:
#>    rate :  0.001 
#> 
#> $control_dcurve
#> Curve Object
#> Type: Exponential 
#> Distribution Parameters:
#>    rate :  0.002 
#> 
#> $rcurve
#> RCurve Object
#> Type: LinearR 
#> N Total: 400 
#> N Active: 200 
#> N Control: 200 
#> Recruitment Ratio: 1 
#> Distribution Parameters:
#>    rlength :  12 
#> 
#> $HRbound
#> [1] 1.3
#> 
#> $Summary
#>    Time Patients Events_Active Events_Control E_Events_Active E_Events_Control
#> 1     1    33.33         0.570          0.806           0.691            0.684
#> 2     2    66.67         2.227          3.118           2.699            2.645
#> 3     3   100.00         4.896          6.790           5.930            5.757
#> 4     4   133.33         8.509         11.691          10.297            9.903
#> 5     5   166.67        13.001         17.701          15.719           14.983
#> 6     6   200.00        18.310         24.712          22.121           20.902
#> 7     7   233.33        24.382         32.629          29.434           27.578
#> 8     8   266.67        31.165         41.362          37.592           34.934
#> 9     9   300.00        38.608         50.833          46.536           42.904
#> 10   10   333.33        46.668         60.969          56.210           51.427
#>    Events_Total  HR   LogHR LogHR_SE HR_CI_Upper HR_CI_Lower Peto_LogHR
#> 1         1.375 0.7 -0.3567  1.72184     20.4516      0.0240    -0.3533
#> 2         5.344 0.7 -0.3567  0.87279      3.8728      0.1265    -0.3536
#> 3        11.686 0.7 -0.3567  0.58979      2.2239      0.2203    -0.3539
#> 4        20.200 0.7 -0.3567  0.44829      1.6853      0.2907    -0.3542
#> 5        30.701 0.7 -0.3567  0.36339      1.4270      0.3434    -0.3544
#> 6        43.023 0.7 -0.3567  0.30679      1.2771      0.3837    -0.3547
#> 7        57.011 0.7 -0.3567  0.26636      1.1798      0.4153    -0.3550
#> 8        72.526 0.7 -0.3567  0.23604      1.1118      0.4407    -0.3552
#> 9        89.441 0.7 -0.3567  0.21246      1.0616      0.4616    -0.3555
#> 10      107.637 0.7 -0.3567  0.19359      1.0230      0.4790    -0.3557
#>    Expected_Z Expected_P Log_Rank_Stat Variance V_Pike_Peto Event_Ratio
#> 1     -0.2071     0.4179       -0.1215    0.344       0.344      0.7072
#> 2     -0.4087     0.3414       -0.4723    1.336       1.336      0.7142
#> 3     -0.6048     0.2727       -1.0335    2.921       2.921      0.7211
#> 4     -0.7956     0.2131       -1.7875    5.047       5.048      0.7278
#> 5     -0.9815     0.1632       -2.7181    7.669       7.671      0.7345
#> 6     -1.1626     0.1225       -3.8105   10.742      10.747      0.7409
#> 7     -1.3391     0.0903       -5.0511   14.229      14.238      0.7473
#> 8     -1.5111     0.0654       -6.4275   18.093      18.107      0.7535
#> 9     -1.6788     0.0466       -7.9282   22.302      22.323      0.7595
#> 10    -1.8425     0.0327       -9.5426   26.825      26.856      0.7654
#>    Schoenfeld_Power Event_Prop_Power Z_Power Frontier_Power Estimated_SS
#> 1            0.0551           0.0545  0.0398         0.0396        31896
#> 2            0.1067           0.1048  0.0604         0.0870         8209
#> 3            0.1836           0.1799  0.0877         0.1695         3755
#> 4            0.2847           0.2789  0.1221         0.2724         2172
#> 5            0.4032           0.3954  0.1639         0.3927         1429
#> 6            0.5280           0.5190  0.2126         0.5194         1020
#> 7            0.6469           0.6378  0.2673         0.6404          770
#> 8            0.7505           0.7421  0.3268         0.7458          605
#> 9            0.8333           0.8263  0.3893         0.8302          491
#> 10           0.8946           0.8893  0.4532         0.8926          408

3.5 Planning for Landmark and RMST Analysis

nph_traj can be provided essentially any valid Curve object for each event distribution, and fully supports non-proportional hazards. To take full advantage of this, it can calculate expectations for landmark and RMST quantities, along with the estimated standard errors and power calculations.

To request RMST calculations, include the argument ‘RMST = x’, where x is the restriction time of interest. To request landmark calculations, include the argument ‘landmark = y’, where y is the landmark time of interest. Here is an example of a non-proportional hazards calculation where estimates of power for both RMST and landmark calculations are requested:

# Define exponential distributions for active and control arms, with HR of 0.7
activeCurveNPH <- Weibull(alpha=100,beta=0.8)
controlCurveNPH <-  Weibull(alpha=50,beta=1)

# Run nph_traj with default settings to calculate expected properties up to 30 months
output2 <- nph_traj(active_ecurve=activeCurveNPH,control_ecurve=controlCurveNPH,rcurve=rcurve3,max_assessment=30,RMST=20,landmark=20)
# Display output
output2
#> $active_ecurve
#> Curve Object
#> Type: Weibull 
#> Distribution Parameters:
#>    scale :  100 
#>    shape :  0.8 
#> 
#> $control_ecurve
#> Curve Object
#> Type: Weibull 
#> Distribution Parameters:
#>    scale :  50 
#>    shape :  1 
#> 
#> $active_dcurve
#> Curve Object
#> Type: Blank 
#> Distribution Parameters:
#>    Zero :  0 
#> 
#> $control_dcurve
#> Curve Object
#> Type: Blank 
#> Distribution Parameters:
#>    Zero :  0 
#> 
#> $rcurve
#> RCurve Object
#> Type: LinearR 
#> N Total: 400 
#> N Active: 200 
#> N Control: 200 
#> Recruitment Ratio: 1 
#> Distribution Parameters:
#>    rlength :  12 
#> 
#> $HRbound
#> [1] 1
#> 
#> $Summary
#>    Time Patients Events_Active Events_Control Events_Total     HR   LogHR
#> 1     1    33.33         0.231          0.166        0.396 1.3970  0.3343
#> 2     2    66.67         0.798          0.658        1.456 1.2173  0.1967
#> 3     3   100.00         1.646          1.470        3.116 1.1235  0.1165
#> 4     4   133.33         2.747          2.597        5.344 1.0616  0.0598
#> 5     5   166.67         4.085          4.031        8.116 1.0162  0.0160
#> 6     6   200.00         5.644          5.767       11.411 0.9806 -0.0196
#> 7     7   233.33         7.413          7.799       15.212 0.9516 -0.0496
#> 8     8   266.67         9.385         10.120       19.505 0.9273 -0.0755
#> 9     9   300.00        11.550         12.725       24.275 0.9064 -0.0983
#> 10   10   333.33        13.901         15.609       29.510 0.8882 -0.1185
#> 11   11   366.67        16.433         18.766       35.199 0.8721 -0.1368
#> 12   12   400.00        19.140         22.190       41.330 0.8577 -0.1535
#> 13   13   400.00        21.786         25.711       47.497 0.8412 -0.1729
#> 14   14   400.00        24.260         29.162       53.422 0.8243 -0.1932
#> 15   15   400.00        26.614         32.545       59.159 0.8087 -0.2124
#> 16   16   400.00        28.871         35.861       64.732 0.7943 -0.2303
#> 17   17   400.00        31.045         39.111       70.156 0.7813 -0.2468
#> 18   18   400.00        33.146         42.297       75.443 0.7694 -0.2622
#> 19   19   400.00        35.183         45.419       80.602 0.7585 -0.2764
#> 20   20   400.00        37.160         48.480       85.640 0.7485 -0.2897
#> 21   21   400.00        39.084         51.481       90.564 0.7393 -0.3021
#> 22   22   400.00        40.957         54.421       95.379 0.7308 -0.3137
#> 23   23   400.00        42.784         57.304      100.088 0.7228 -0.3246
#> 24   24   400.00        44.568         60.130      104.698 0.7154 -0.3349
#> 25   25   400.00        46.311         62.899      109.210 0.7085 -0.3446
#> 26   26   400.00        48.016         65.575      113.590 0.7020 -0.3539
#> 27   27   400.00        49.684         68.275      117.959 0.6959 -0.3625
#> 28   28   400.00        51.318         70.883      122.201 0.6902 -0.3708
#> 29   29   400.00        52.919         73.440      126.359 0.6847 -0.3787
#> 30   30   400.00        54.488         75.946      130.434 0.6796 -0.3863
#>    LogHR_SE Schoenfeld_Power Frontier_Power RMST_Restrict RMST_Active
#> 1   3.20639           0.0195         0.0250            20          NA
#> 2   1.66272           0.0188         0.0211            20          NA
#> 3   1.13414           0.0196         0.0216            20          NA
#> 4   0.86535           0.0212         0.0227            20          NA
#> 5   0.70206           0.0237         0.0242            20          NA
#> 6   0.59210           0.0270         0.0262            20          NA
#> 7   0.51290           0.0312         0.0286            20          NA
#> 8   0.45307           0.0365         0.0346            20          NA
#> 9   0.40624           0.0429         0.0407            20          NA
#> 10  0.36856           0.0507         0.0482            20          NA
#> 11  0.33758           0.0601         0.0573            20          NA
#> 12  0.31164           0.0712         0.0680            20          NA
#> 13  0.29083           0.0863         0.0825            20          NA
#> 14  0.27436           0.1049         0.1004            20          NA
#> 15  0.26084           0.1265         0.1212            20          NA
#> 16  0.24948           0.1506         0.1447            20          NA
#> 17  0.23975           0.1771         0.1704            20          NA
#> 18  0.23129           0.2057         0.1983            20          NA
#> 19  0.22386           0.2360         0.2280            20          NA
#> 20  0.21725           0.2677         0.2591            20          NA
#> 21  0.21133           0.3006         0.2915            20     17.2073
#> 22  0.20599           0.3342         0.3248            20     17.2073
#> 23  0.20113           0.3683         0.3586            20     17.2073
#> 24  0.19670           0.4025         0.3926            20     17.2073
#> 25  0.19263           0.4366         0.4266            20     17.2073
#> 26  0.18894           0.4705         0.4604            20     17.2073
#> 27  0.18535           0.5034         0.4935            20     17.2073
#> 28  0.18219           0.5357         0.5259            20     17.2073
#> 29  0.17918           0.5670         0.5573            20     17.2073
#> 30  0.17638           0.5971         0.5877            20     17.2073
#>    RMST_Control RMST_Delta RMST_SE RMST_Z RMST_Power RMST_Failure LM_Time
#> 1            NA         NA      NA     NA         NA            1      20
#> 2            NA         NA      NA     NA         NA            1      20
#> 3            NA         NA      NA     NA         NA            1      20
#> 4            NA         NA      NA     NA         NA            1      20
#> 5            NA         NA      NA     NA         NA            1      20
#> 6            NA         NA      NA     NA         NA            1      20
#> 7            NA         NA      NA     NA         NA            1      20
#> 8            NA         NA      NA     NA         NA            1      20
#> 9            NA         NA      NA     NA         NA            1      20
#> 10           NA         NA      NA     NA         NA            1      20
#> 11           NA         NA      NA     NA         NA            1      20
#> 12           NA         NA      NA     NA         NA            1      20
#> 13           NA         NA      NA     NA         NA            1      20
#> 14           NA         NA      NA     NA         NA            1      20
#> 15           NA         NA      NA     NA         NA            1      20
#> 16           NA         NA      NA     NA         NA            1      20
#> 17           NA         NA      NA     NA         NA            1      20
#> 18           NA         NA      NA     NA         NA            1      20
#> 19           NA         NA      NA     NA         NA            1      20
#> 20           NA         NA      NA     NA         NA            1      20
#> 21       16.484     0.7233  0.6160 1.1742     0.2160            0      20
#> 22       16.484     0.7233  0.6084 1.1888     0.2203            0      20
#> 23       16.484     0.7233  0.6025 1.2006     0.2238            0      20
#> 24       16.484     0.7233  0.5978 1.2099     0.2266            0      20
#> 25       16.484     0.7233  0.5942 1.2172     0.2288            0      20
#> 26       16.484     0.7233  0.5916 1.2226     0.2305            0      20
#> 27       16.484     0.7233  0.5897 1.2265     0.2316            0      20
#> 28       16.484     0.7233  0.5885 1.2291     0.2324            0      20
#> 29       16.484     0.7233  0.5878 1.2306     0.2329            0      20
#> 30       16.484     0.7233  0.5874 1.2314     0.2331            0      20
#>    LM_Active LM_Control LM_Delta LM_A_SE LM_C_SE LM_D_SE   LM_Z LM_Power
#> 1         NA         NA       NA      NA      NA      NA     NA       NA
#> 2         NA         NA       NA      NA      NA      NA     NA       NA
#> 3         NA         NA       NA      NA      NA      NA     NA       NA
#> 4         NA         NA       NA      NA      NA      NA     NA       NA
#> 5         NA         NA       NA      NA      NA      NA     NA       NA
#> 6         NA         NA       NA      NA      NA      NA     NA       NA
#> 7         NA         NA       NA      NA      NA      NA     NA       NA
#> 8         NA         NA       NA      NA      NA      NA     NA       NA
#> 9         NA         NA       NA      NA      NA      NA     NA       NA
#> 10        NA         NA       NA      NA      NA      NA     NA       NA
#> 11        NA         NA       NA      NA      NA      NA     NA       NA
#> 12        NA         NA       NA      NA      NA      NA     NA       NA
#> 13        NA         NA       NA      NA      NA      NA     NA       NA
#> 14        NA         NA       NA      NA      NA      NA     NA       NA
#> 15        NA         NA       NA      NA      NA      NA     NA       NA
#> 16        NA         NA       NA      NA      NA      NA     NA       NA
#> 17        NA         NA       NA      NA      NA      NA     NA       NA
#> 18        NA         NA       NA      NA      NA      NA     NA       NA
#> 19        NA         NA       NA      NA      NA      NA     NA       NA
#> 20        NA         NA       NA      NA      NA      NA     NA       NA
#> 21    0.7589     0.6703   0.0885  0.0413  0.0481  0.0634 1.3971   0.2868
#> 22    0.7589     0.6703   0.0885  0.0374  0.0429  0.0569 1.5562   0.3432
#> 23    0.7589     0.6703   0.0885  0.0351  0.0399  0.0532 1.6647   0.3839
#> 24    0.7589     0.6703   0.0885  0.0336  0.0379  0.0507 1.7465   0.4155
#> 25    0.7589     0.6703   0.0885  0.0326  0.0365  0.0489 1.8104   0.4405
#> 26    0.7589     0.6703   0.0885  0.0318  0.0354  0.0476 1.8595   0.4600
#> 27    0.7589     0.6703   0.0885  0.0312  0.0347  0.0467 1.8976   0.4751
#> 28    0.7589     0.6703   0.0885  0.0308  0.0341  0.0460 1.9262   0.4865
#> 29    0.7589     0.6703   0.0885  0.0306  0.0337  0.0455 1.9467   0.4947
#> 30    0.7589     0.6703   0.0885  0.0304  0.0334  0.0452 1.9601   0.5001

All of the RMST output columns are prefixed RMST_ . The first column, RMST_Restrict reports the restriction time, then RMST_Active, RMST_Control and RMST_Delta provide the RMST estimates for each arm and the difference between them. Following them, RMST_SE provides the SE for the difference, RMST_Z provides the expected Z score, and RMST_Power the power.Finally, RMST_Failure provides the probability that the RMST calculation will fail due to one or both of the arms having an incalculable RMST (caused by the Survival curve being undefined at the time of restriction).

All of the landmark output columns are prefixed LM_. The first column, LM_Time reports the time of the landmark analysis. LM_Active and LM_Control then give the estimates of S(t) for each arm, and LM_Delta that for the difference. LM_A_SE, LM_C_SE and LM_D_SE then give the corresponding standard errors. Finally LM_Z and LM_Power provide the expected Z score and power.

3.6 Plotting nph_traj Output

A plotting function, plot_npht is available to provide some commonly-useful trajectory plots for nph_traj output.

By default, this produces plots of the Survival functions for the event distributions (a KM plot), the CDFs of the censoring distributions, recruited patients over time, expected total events over time, expected HR over time (with predicted CI), and a plot of power over time for all available methods in the data.

The first three of these are useful for checking assumptions, to e.g. check that the distributions are reasonable and what was wanted. The expected event plot is helpful for setting expectations for trial conduct of event numbers. The logHR plot is useful in non-proportional hazards trials for looking at how the expected HR will change over time; the predicted CI for it also allows a straightforward observation of the time at which the HR will become significant. Finally, the power plot allows the evolution of the power over time to be observed, and comparisons to be made between the different analysis methods.

All plots are produced by default, but each individual plot may be disabled via arguments, and likewise arguments may be used to prevent the plotting of individual trajectories from the power plot. Standard plotting arguments may also be passed to it, e.g. regarding text size.

4 Simulating Trial Properties

4.1 Simulating Data: simulate_trials

Simulating trials is an alternative to direct property calculation. As it is much slower, introduces Monte Carlo error and can only simulate one assessment time per run, it is recommended to use it primarily to check analytic results from nph_traj or as a starting point for simulating more complex cases that are beyond the scope of nph_traj.

Syntax for simulate_trials is very similar to that of nph_traj, and the same Curves and RCurve can be passed to it via the same argument names (e.g. ‘active_ecurve’; see section 3.1 for more details). As with nph_traj, it is minimally required to specify distribution Curve objects for the active and control arms, as well as an RCurve object describing the recruitment. For further details on Curve/RCurve objects, please see section 2.

In addition to these similar inputs, two simulation-specific arguments are also required; the number of simulation iterations to run (‘iterations’) and the random seed to use (‘seed’). Finally, it is also necessary to specify either a time (‘assess’), or a fixed number of events (‘fix_events’) to analyse at. If both are specified, the fixed event number will be used.

Four columns are produced in the output matrix; ‘Time’, which gives the time, ‘Censored’, which gives the censoring indicator (where 1 is patient was censored), ‘Trt’, which gives the treatment number, and ‘Iter’, which is the iteration number.

A simple example corresponding to the first example from the nph_traj section is given here:

Using the same syntax as nph_traj, you can also specify censoring distributions that control dropout by providing Curve objects to the two dcurve arguments.

Also similar to nph_traj, more detailed output can be requested (‘detailed_output=TRUE’), which includes the times for events and censorings that were not observed because something else happened first. For most purposes this extra detail is not required, but it can be useful for checking distributions and it is required if you later want to change the time of assessment (or fixed event number). Functions to change assessment times will be covered in section 4.2.

Two formats of simulated data are supported by gestate; matrix and list format. In the default matrix format, all simulated data is present in a single table. In list format, each iteration is separated into its own table within a list and the Iter column is omitted. The ‘output_type’ argument controls which is produced. Both formats are automatically supported by analyse_sim.

When performing large simulations in R, all data is simultaneously held within RAM. Efforts has been taken to reduce the RAM usage of gestate’s simulations, but large simulations (>10,000) will still require a large quantity of available system memory; exceeding this leads to error messages about the inability to allocate sufficient RAM, excessive slowdown and/or system crashes. Considerable RAM savings can be made by shrinking the width of the simulation data sets by using the list format and not using detailed output, so these settings are recommended for large ‘production’ simulations; together these options save about 2/3rds of the RAM footprint compared to detailed output in matrix format.

A more complicated example showing the additional options and based on the second and third examples from the nph_traj section is presented here:

4.2 Simulating Stratified Data: simulate_trials_strata

Sometimes it is believed that there are important covariates in a trial or that there is heterogeneity of treatment effect between different patient strata. simulate_trials_strata is available to simulate this.

simulate_trials_strata acts as a wrapper for simulate_trials; consequently syntax is almost identical, with two main exceptions:

Firstly there are two additional arguments; ‘stratum_probs’ and ‘stratum_name’. The first is a required vector summing to 1 that contains the probability of a patient belonging to each stratum, with entries corresponding to each stratum required. If ‘stratum_probs’ does not sum to 1, an error will be produced. The second is optional and gives the name of the column with the stratum variable in.

The other main syntax difference is that each of the four arguments requiring Curve objects instead now instead takes a list of Curve objects. If a single Curve is supplied to a given argument, it will be shared across all strata (this is commonly done for the censoring arguments). Otherwise, the list should be of the same length as the ‘stratum_probs’. In this case, the Curve objects provide the distributions for events or censorings within their arm for their respective strata only.

A simple example is as follows:

4.3 Modifying Simulated Data: set_event_number and set_assess_time

Once simulations are complete, they may be analysed straight away. However sometimes it is desirable to first modify the data. gestate has two functions available to do this; ‘set_event_number’ and ‘set_assess_time’. These can be used to modify the assessment times to respectively either a fixed event number or a given assessment time. Both are usable on simulations previously using either approach, so long as detailed output is available. By keeping the data’s existing assessment criteria, it is also possible to just use these functions to change the format of the data from matrix to list, or vice versa.

There is one caveat; Patients that have not yet entered the trial at the time of assessment are always excluded from the data set. If the current assessment time for an iteration is before the maximum recruitment time, then any increases in assessment time will lead to errors caused by these missing patients. This is straightforward to check if the original data has a fixed assessment time, but can be awkward if a fixed event number was specified as the assessment time will vary between iterations. It is recommended therefore to only apply it to fixed event simulations if either the pre-existing number of events is sufficiently large that no iteration is still in the recruitment period, or if the number of events is being reduced (rather than a fixed time being set).

Using either function is straightforward; provide the name of the simulated data set as an argument to the relevant function, then the new time / event number. By default the output format will be the same as that entered, and detailed output will remain. Both of these may be changed by the same arguments discussed in the previous section.

Two examples:

4.4 Analysing Simulated Data: analyse_sim

analyse_sim can be used to analyse data simulated by gestate. It is designed to automatically detect the input formats and provide minimal-fuss analysis. To perform a log-rank test and Cox regression is thus simply:

analyse_sim produces a matrix with each row containing the results of one iteration. For Cox regression, output is produced for the HR and log-HR as well as its standard error, Z score and p-value. For the log-rank test the Z score and p-value are returned. The number of events in each arm is also produced,

Stratified analysis for a single variable (covariate adjustment for Cox) is also supported by using the ‘stratum’ argument to provide the column name of the stratification variable:

Landmark and RMST analysis may also be requested by specifying the landmark and restriction times respectively to the ‘landmark’ and ‘RMST’ arguments, (the same syntax as nph_traj). RMST analysis follows the methods from Uno et al and Tian et al5,6,7, while landmark analysis uses the Greenwood standard error and normal approximation approach.

For both landmark and RMST analyses, estimates are returned for each arm and the difference between them. Standard errors for all three quantities are produced, along with the Z score and p-value.

To speed up analysis, gestate supports parallel processing through the doParallel package. If this is installed then the ‘parallel_cores’ argument may be set to the required number of cores. Analysis code has also been optimised in V1.4.x to increase speed, with 4-fold improvement overall and at least 50-fold increase for covariate-adjusted RMST in comparison to v1.3.2.

A final analysis example is:

# Perform log-rank test, Cox regression, landmark analysis and RMST analysis on simulated data using parallel processing.
analysis2 <- analyse_sim(data=simulation2, RMST=10, landmark=10, parallel_cores=1)
head(analysis2)
#>                 HR       LogHR  LogHR_SE       HR_Z         HR_P       LR_Z
#> result.1 0.9741865 -0.02615254 0.2000819 -0.1307092 0.4480026661 -0.1307129
#> result.2 0.8666650 -0.14310281 0.2003264 -0.7143482 0.2375059393 -0.7149549
#> result.3 0.6853549 -0.37781849 0.2024118 -1.8665836 0.0309798814 -1.8775705
#> result.4 0.7639749 -0.26922032 0.2015036 -1.3360573 0.0907652693 -1.3400901
#> result.5 0.5216262 -0.65080409 0.2072831 -3.1396877 0.0008456401 -3.1951239
#> result.6 0.7821452 -0.24571494 0.2010825 -1.2219610 0.1108611870 -1.2250305
#>                  LR_P Events_Active Events_Control RMST_Restrict RMST_Active
#> result.1 0.4480011946            49             51            10    8.894469
#> result.2 0.2373184580            48             52            10    9.115089
#> result.3 0.0302199738            43             57            10    9.467461
#> result.4 0.0901080329            44             56            10    9.024831
#> result.5 0.0006988541            37             63            10    9.478867
#> result.6 0.1102818733            45             55            10    9.157045
#>          RMST_A_SE RMST_Control RMST_C_SE  RMST_Delta RMST_D_SE     RMST_Z
#> result.1 0.1908676     9.200848 0.1544769 -0.30637988 0.2455475 -1.2477419
#> result.2 0.1659930     9.048694 0.1737710  0.06639462 0.2403124  0.2762846
#> result.3 0.1291302     9.223193 0.1568467  0.24426754 0.2031637  1.2023187
#> result.4 0.1765084     8.969905 0.1658831  0.05492567 0.2422239  0.2267558
#> result.5 0.1213474     8.969737 0.1815976  0.50912949 0.2184099  2.3310738
#> result.6 0.1733281     8.924620 0.1764223  0.23242511 0.2473205  0.9397728
#>               RMST_P LM_Time LM_Active    LM_A_SE LM_Control    LM_C_SE
#> result.1 0.893937212      10 0.8288031 0.02673240  0.8385181 0.02615064
#> result.2 0.391164727      10 0.8386908 0.02612384  0.8331164 0.02652844
#> result.3 0.114620035      10 0.8895153 0.02221686  0.8593153 0.02465128
#> result.4 0.410306813      10 0.8196045 0.02722308  0.7917703 0.02896078
#> result.5 0.009874735      10 0.8945876 0.02175786  0.8124251 0.02780837
#> result.6 0.173667053      10 0.8650000 0.02416351  0.8098204 0.02776559
#>              LM_Delta    LM_D_SE       LM_Z        LM_P
#> result.1 -0.009714975 0.03739622 -0.2597850 0.602485182
#> result.2  0.005574430 0.03723188  0.1497220 0.440491988
#> result.3  0.030200029 0.03318546  0.9100380 0.181401230
#> result.4  0.027834189 0.03974698  0.7002843 0.241874875
#> result.5  0.082162579 0.03530878  2.3269728 0.009983358
#> result.6  0.055179641 0.03680765  1.4991353 0.066919273

4.5 Summarising Analysed Data: summarise_analysis

summarise_analysis is a function to summarise the results from multiple simulation iterations. It reads in the output from analyse_sim and automatically detects which analyses have been performed, adjusting its output as necessary. Apart from the name of the input analysis data, the only parameter is ‘alpha1’, which corresponds to the one-sided alpha level for power calculations. By default this is 0.025 (i.e. 2.5% 1-sided alpha)

An example call is:

The output table contains a summary across all iterations of the simulation, with overall values and power. For Cox regression, the HR and logHR and SE of the logHR are produced, along with the average Z score, p-value and power. For the log-rank test, the average Z score, p value and power are returned. Failure rates (probability of analysis failure) for each are also given. If RMST and/or landmark analysis were previously requested, average values and SEs are given for each arm and the difference between arms. The powers and probabilities of analysis failure are also given.

4.6 Comparisons of Simulated Data to Analytic Outputs

Simulation may be used to validate analytic results; A simple workflow to compare outputs from nph_traj and summarise_analysis might look like the following:

# Run analytic planning and select a suitable time
outputX <- nph_traj(active_ecurve=activeCurveNPH, control_ecurve=controlCurveNPH, rcurve=rcurve3, max_assessment=50, RMST=40, landmark=40)
outputX$Summary[41:50,]
#>    Time Patients Events_Active Events_Control Events_Total     HR   LogHR
#> 41   41      400        70.009        100.444      170.453 0.6366 -0.4517
#> 42   42      400        71.284        102.416      173.700 0.6336 -0.4564
#> 43   43      400        72.539        104.348      176.887 0.6307 -0.4610
#> 44   44      400        73.776        106.242      180.018 0.6279 -0.4654
#> 45   45      400        74.994        108.099      183.093 0.6252 -0.4697
#> 46   46      400        76.194        109.919      186.113 0.6226 -0.4739
#> 47   47      400        77.377        111.702      189.079 0.6201 -0.4779
#> 48   48      400        78.543        113.451      191.994 0.6176 -0.4818
#> 49   49      400        79.692        115.164      194.856 0.6153 -0.4856
#> 50   50      400        80.825        116.844      197.669 0.6130 -0.4893
#>    LogHR_SE Schoenfeld_Power Frontier_Power RMST_Restrict RMST_Active
#> 41  0.15424           0.8386         0.8333            40     30.9011
#> 42  0.15277           0.8526         0.8477            40     30.9011
#> 43  0.15137           0.8655         0.8610            40     30.9011
#> 44  0.15003           0.8774         0.8733            40     30.9011
#> 45  0.14874           0.8884         0.8845            40     30.9011
#> 46  0.14751           0.8984         0.8948            40     30.9011
#> 47  0.14632           0.9076         0.9043            40     30.9011
#> 48  0.14518           0.9159         0.9129            40     30.9011
#> 49  0.14408           0.9236         0.9208            40     30.9011
#> 50  0.14303           0.9306         0.9280            40     30.9011
#>    RMST_Control RMST_Delta RMST_SE RMST_Z RMST_Power RMST_Failure LM_Time
#> 41      27.5336     3.3675  1.3992 2.4067     0.6725        5e-04      40
#> 42      27.5336     3.3675  1.3959 2.4124     0.6745        0e+00      40
#> 43      27.5336     3.3675  1.3933 2.4170     0.6762        0e+00      40
#> 44      27.5336     3.3675  1.3912 2.4206     0.6775        0e+00      40
#> 45      27.5336     3.3675  1.3896 2.4234     0.6785        0e+00      40
#> 46      27.5336     3.3675  1.3884 2.4255     0.6792        0e+00      40
#> 47      27.5336     3.3675  1.3875 2.4270     0.6798        0e+00      40
#> 48      27.5336     3.3675  1.3869 2.4280     0.6801        0e+00      40
#> 49      27.5336     3.3675  1.3866 2.4286     0.6803        0e+00      40
#> 50      27.5336     3.3675  1.3864 2.4289     0.6805        0e+00      40
#>    LM_Active LM_Control LM_Delta LM_A_SE LM_C_SE LM_D_SE   LM_Z LM_Power
#> 41    0.6185     0.4493   0.1692  0.0416  0.0452  0.0615 2.7516   0.7857
#> 42    0.6185     0.4493   0.1692  0.0390  0.0416  0.0570 2.9679   0.8432
#> 43    0.6185     0.4493   0.1692  0.0375  0.0396  0.0545 3.1047   0.8738
#> 44    0.6185     0.4493   0.1692  0.0365  0.0382  0.0528 3.2014   0.8928
#> 45    0.6185     0.4493   0.1692  0.0358  0.0373  0.0517 3.2731   0.9054
#> 46    0.6185     0.4493   0.1692  0.0353  0.0366  0.0509 3.3269   0.9142
#> 47    0.6185     0.4493   0.1692  0.0350  0.0361  0.0502 3.3671   0.9203
#> 48    0.6185     0.4493   0.1692  0.0347  0.0357  0.0498 3.3968   0.9246
#> 49    0.6185     0.4493   0.1692  0.0345  0.0355  0.0495 3.4177   0.9275
#> 50    0.6185     0.4493   0.1692  0.0344  0.0353  0.0493 3.4312   0.9294

We select 47 months as it gives 90% power for LR/Cox and specify 2000 simulations (this is due to technical limitations in vignette creation; typically 10,000+ are recommended):

We can now pair results up and directly compare them between methods:

As expected, all properties align closely. For testing whether the two values differ by more than would be expected by random, Monte Carlo errors should be calculated for each simulated quantity; in general this is also good practice for analysis and reporting of simulation results.

5 References

1 Bell J (2019) Power Calculations for Time-to-Event Trials Using Predicted Event Proportions. Manuscript submitted for publication.

2 Pike MC (1972). Contribution to the Discussion on the Paper “Asymptotically efficient rank invariant test procedures” by R. Peto and J. Peto. Journal of the Royal Statistical Society Series A; 135(2): 201-203.

3 Schoenfeld D (1981) The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika; 68: 316-319.

4 Schoenfeld DA (1983) Sample-size formula for the proportional-hazards regression model. Biometrics; 39(2): 499-503.

5 Uno H, Tian L, Cronin A, Battioui C, Horiguchi M. (2017) survRM2: Comparing Restricted Mean Survival Time. R package version 1.0-2. https://CRAN.R-project.org/package=survRM2

6 Uno H, Claggett B, Tian L, Inoue E, Gallo P, Miyata T, Schrag D, Takeuchi M, Uyama Y, Zhao L, Skali H, Solomon S, Jacobus S, Hughes M, Packer M, Wei LJ. (2014) Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. Journal of clinical Oncology; 32: 2380-2385.

7 Tian L, Zhao L, Wei LJ. (2014) Predicting the restricted mean event time with the subject’s baseline covariates in survival analysis. Biostatistics; 15: 222-233.