Fitting the Highly Adaptive Lasso with hal9001

Nima Hejazi, Jeremy Coyle, Rachael Phillips, Lars van der Laan

2022-02-09

Introduction

The highly adaptive Lasso (HAL) is a flexible machine learning algorithm that nonparametrically estimates a function based on available data by embedding a set of input observations and covariates in an extremely high-dimensional space (i.e., generating basis functions from the available data). For an input data matrix of \(n\) observations and \(d\) covariates, the maximum number of zero-order basis functions generated is approximately \(n \cdot 2^{d - 1}\). To select a set of basis functions from among the (possibly reduced/screener) set that’s generated, the lasso is employed. The hal9001 R package (Hejazi, Coyle, and van der Laan 2020; Coyle, Hejazi, and van der Laan, n.d.) provides an efficient implementation of this routine, relying on the glmnet R package (Friedman, Hastie, and Tibshirani 2010) for compatibility with the canonical Lasso implementation and using lasso regression with an input matrix composed of basis functions. Consult Benkeser and van der Laan (2016), (???), van der Laan (2017) for detailed theoretical descriptions of HAL and its various optimality properties.


Preliminaries

library(data.table)
library(ggplot2)
# simulation constants
set.seed(467392)
n_obs <- 500
n_covars <- 3

# make some training data
x <- replicate(n_covars, rnorm(n_obs))
y <- sin(x[, 1]) + sin(x[, 2]) + rnorm(n_obs, mean = 0, sd = 0.2)

# make some testing data
test_x <- replicate(n_covars, rnorm(n_obs))
test_y <- sin(x[, 1]) + sin(x[, 2]) + rnorm(n_obs, mean = 0, sd = 0.2)

Let’s look at simulated data:

head(x)
##             [,1]       [,2]       [,3]
## [1,]  2.44102981 -0.4337909  0.4670282
## [2,] -1.21932335  0.3336395  0.8894277
## [3,] -0.40613567 -0.3869374  0.3474353
## [4,] -1.09760477 -1.4663219 -0.1173214
## [5,]  0.23710498  1.2565812  1.8049389
## [6,]  0.06810091 -0.7020905  0.9301941
head(y)
## [1]  0.2372289 -0.6023415 -0.7569124 -1.8021339  1.0589707 -0.3373555

Using the Highly Adaptive Lasso

library(hal9001)
## Loading required package: Rcpp
## hal9001 v0.4.3: The Scalable Highly Adaptive Lasso
## note: fit_hal defaults have changed. See ?fit_hal for details

Fitting the model

HAL uses the popular glmnet R package for the lasso step:

##                   user.self sys.self elapsed user.child sys.child
## enumerate_basis       0.156    0.000   0.156          0         0
## design_matrix         0.037    0.000   0.037          0         0
## reduce_basis          0.000    0.000   0.000          0         0
## remove_duplicates     0.000    0.000   0.000          0         0
## lasso                 3.265    0.007   3.272          0         0
## total                 3.458    0.007   3.465          0         0

Summarizing the model

While the raw output object may be examined, it has (usually large) slots that make quick examination challenging. The summary method provides an interpretable table of basis functions with non-zero coefficients. All terms (i.e., including the terms with zero coefficient) can be included by setting only_nonzero_coefs to FALSE when calling summary on a hal9001 model object.

## 
## 
## Summary of non-zero coefficients is based on lambda of 0.00487 
## 
##           coef
##  -1.424003e+00
##   3.654157e-01
##  -2.358064e-01
##   2.182822e-01
##  -1.713943e-01
##   1.648782e-01
##   1.622979e-01
##  -1.205285e-01
##  -9.526696e-02
##  -9.382067e-02
##   5.468026e-02
##   5.273315e-02
##  -5.056465e-02
##   4.527346e-02
##  -3.735277e-02
##  -3.543529e-02
##   2.380514e-02
##  -2.255508e-02
##  -2.176114e-02
##  -1.612916e-02
##   1.531317e-02
##  -1.233111e-02
##   1.172920e-02
##   8.162029e-03
##  -7.160000e-03
##  -5.500155e-03
##   5.083293e-03
##  -3.625005e-03
##   1.488567e-03
##   1.350348e-03
##   3.224244e-04
##  -3.021403e-04
##  -2.126168e-04
##  -6.625025e-05
##   1.927162e-06
##           coef
##                                                                                                             term
##                                                                                                      (Intercept)
##                                                                              [ I(x2 >= -1.583)*(x2 - -1.583)^1 ]
##                                          [ I(x2 >= 1.595)*(x2 - 1.595)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x1 >= -0.962)*(x1 - -0.962)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                          [ I(x1 >= 1.606)*(x1 - 1.606)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ]
##                                          [ I(x2 >= -1.11)*(x2 - -1.11)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x1 >= -1.403)*(x1 - -1.403)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ]
##                                          [ I(x1 >= 0.941)*(x1 - 0.941)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                          [ I(x2 >= 1.017)*(x2 - 1.017)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                          [ I(x1 >= 1.368)*(x1 - 1.368)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x2 >= -1.565)*(x2 - -1.565)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                                                              [ I(x1 >= -1.844)*(x1 - -1.844)^1 ]
##                                          [ I(x2 >= 0.696)*(x2 - 0.696)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##    [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 1.156)*(x2 - 1.156)^1 ] * [ I(x3 >= -0.497)*(x3 - -0.497)^1 ]
##  [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -0.699)*(x2 - -0.699)^1 ]
##                                          [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 1.595)*(x2 - 1.595)^1 ]
##                                          [ I(x1 >= 1.135)*(x1 - 1.135)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ]
##    [ I(x1 >= 0.307)*(x1 - 0.307)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x1 >= -0.422)*(x1 - -0.422)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##  [ I(x1 >= -0.901)*(x1 - -0.901)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -0.047)*(x3 - -0.047)^1 ]
##    [ I(x1 >= -0.313)*(x1 - -0.313)^1 ] * [ I(x2 >= 0.118)*(x2 - 0.118)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##    [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 0.489)*(x2 - 0.489)^1 ] * [ I(x3 >= -0.257)*(x3 - -0.257)^1 ]
##    [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 0.489)*(x2 - 0.489)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                                                              [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##    [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= 0.595)*(x3 - 0.595)^1 ]
##  [ I(x1 >= -0.901)*(x1 - -0.901)^1 ] * [ I(x2 >= -0.375)*(x2 - -0.375)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##    [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= 0.374)*(x3 - 0.374)^1 ]
##  [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -0.375)*(x2 - -0.375)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x2 >= -0.699)*(x2 - -0.699)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##  [ I(x1 >= -0.901)*(x1 - -0.901)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                          [ I(x1 >= 0.739)*(x1 - 0.739)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                          [ I(x1 >= 0.594)*(x1 - 0.594)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                        [ I(x1 >= -0.685)*(x1 - -0.685)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##                                                                                                             term

Note the length and width of these tables! The R environment might not be the optimal location to view the summary. Tip: Tables can be exported from R to LaTeX with the xtable R package. Here’s an example: print(xtable(summary(fit)$table, type = "latex"), file = "haltbl_meow.tex").

Obtaining model predictions

## [1] 0.04967736
## [1] 1.778584

Reducing basis functions

As described in Benkeser and van der Laan (2016), the HAL algorithm operates by first constructing a set of basis functions and subsequently fitting a Lasso model with this set of basis functions as the design matrix. Several approaches are considered for reducing this set of basis functions: 1. Removing duplicated basis functions (done by default in the fit_hal function), 2. Removing basis functions that correspond to only a small set of observations; a good rule of thumb is to scale with \(\frac{1}{\sqrt{n}}\), and that is the default.

The second of these two options may be modified by specifying the reduce_basis argument to the fit_hal function:

##                   user.self sys.self elapsed user.child sys.child
## enumerate_basis       0.008    0.000   0.008          0         0
## design_matrix         0.038    0.000   0.039          0         0
## reduce_basis          0.000    0.000   0.000          0         0
## remove_duplicates     0.001    0.000   0.000          0         0
## lasso                 3.072    0.016   3.088          0         0
## total                 3.119    0.016   3.135          0         0

In the above, all basis functions with fewer than 10% of observations meeting the criterion imposed are automatically removed prior to the Lasso step of fitting the HAL regression. The results appear below

##              coef
##  1: -1.4371363869
##  2:  0.3753894167
##  3: -0.2355902545
##  4:  0.2166287283
##  5: -0.1851691193
##  6:  0.1631844732
##  7:  0.1597858903
##  8: -0.1234431109
##  9: -0.0946988403
## 10: -0.0884063929
## 11:  0.0584079820
## 12:  0.0563691217
## 13: -0.0551713872
## 14:  0.0533243296
## 15: -0.0382683652
## 16: -0.0372833807
## 17: -0.0309200484
## 18:  0.0230878908
## 19:  0.0179840202
## 20: -0.0156428631
## 21:  0.0149240584
## 22: -0.0144903662
## 23: -0.0132097499
## 24: -0.0065506011
## 25: -0.0062280674
## 26:  0.0051538711
## 27:  0.0043861580
## 28: -0.0024858177
## 29:  0.0023100499
## 30: -0.0022703275
## 31:  0.0021266202
## 32:  0.0017535038
## 33:  0.0016041990
## 34: -0.0010032801
## 35: -0.0002829866
##              coef
##                                                                                                                term
##  1:                                                                                                     (Intercept)
##  2:                                                                             [ I(x2 >= -1.583)*(x2 - -1.583)^1 ]
##  3:                                         [ I(x2 >= 1.595)*(x2 - 1.595)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##  4:                                       [ I(x1 >= -0.962)*(x1 - -0.962)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##  5:                                         [ I(x1 >= 1.606)*(x1 - 1.606)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ]
##  6:                                         [ I(x2 >= -1.11)*(x2 - -1.11)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##  7:                                       [ I(x1 >= -1.403)*(x1 - -1.403)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ]
##  8:                                         [ I(x1 >= 0.941)*(x1 - 0.941)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
##  9:                                         [ I(x2 >= 1.017)*(x2 - 1.017)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 10:                                         [ I(x1 >= 1.368)*(x1 - 1.368)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 11:                                                                             [ I(x1 >= -1.844)*(x1 - -1.844)^1 ]
## 12:                                       [ I(x2 >= -1.565)*(x2 - -1.565)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 13:                                         [ I(x2 >= 0.696)*(x2 - 0.696)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 14:   [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 1.156)*(x2 - 1.156)^1 ] * [ I(x3 >= -0.497)*(x3 - -0.497)^1 ]
## 15:                                       [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 16: [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 17:                                         [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 1.595)*(x2 - 1.595)^1 ]
## 18:                                       [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -0.699)*(x2 - -0.699)^1 ]
## 19:                                       [ I(x1 >= -0.422)*(x1 - -0.422)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 20:   [ I(x1 >= 0.307)*(x1 - 0.307)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 21:   [ I(x1 >= -0.313)*(x1 - -0.313)^1 ] * [ I(x2 >= 0.118)*(x2 - 0.118)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 22:                                         [ I(x1 >= 1.135)*(x1 - 1.135)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ]
## 23: [ I(x1 >= -0.901)*(x1 - -0.901)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= -0.047)*(x3 - -0.047)^1 ]
## 24: [ I(x1 >= -0.901)*(x1 - -0.901)^1 ] * [ I(x2 >= -0.375)*(x2 - -0.375)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 25:   [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 0.489)*(x2 - 0.489)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 26:   [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= 0.595)*(x3 - 0.595)^1 ]
## 27:   [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= 0.489)*(x2 - 0.489)^1 ] * [ I(x3 >= -0.257)*(x3 - -0.257)^1 ]
## 28:                                         [ I(x1 >= 0.739)*(x1 - 0.739)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 29:                                       [ I(x2 >= -0.699)*(x2 - -0.699)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 30:                                         [ I(x1 >= 0.594)*(x1 - 0.594)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 31:   [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= 0.374)*(x3 - 0.374)^1 ]
## 32: [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -0.375)*(x2 - -0.375)^1 ] * [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 33:                                                                             [ I(x1 >= -1.593)*(x1 - -1.593)^1 ]
## 34:                                                                             [ I(x3 >= -3.289)*(x3 - -3.289)^1 ]
## 35:   [ I(x1 >= -3.224)*(x1 - -3.224)^1 ] * [ I(x2 >= -3.038)*(x2 - -3.038)^1 ] * [ I(x3 >= 1.202)*(x3 - 1.202)^1 ]
##                                                                                                                term

Other approaches exist for reducing the set of basis functions before they are actually created, which is essential for most real-world applications with HAL. Currently, we provide this “pre-screening” via num_knots argument in hal_fit. The num_knots argument is akin to binning: it increases the coarseness of the approximation. num_knots allows one to specify the number of knot points used to generate the basis functions for each/all interaction degree(s). This reduces the total number of basis functions generated, and thus the size of the optimization problem, and it can dramatically decrease runtime. One can pass in a vector of length max_degree to num_knots, specifying the number of knot points to use by interaction degree for each basis function. Thus, one can specify if interactions of higher degrees (e.g., two- or three- way interactions) should be more coarse. Increasing the coarseness of more complex basis functions helps prevent a combinatorial explosion of basis functions, which can easily occur when basis functions are generated for all possible knot points. We will show an example with num_knots in the section that follows.

Specifying smoothness of the HAL model

One might wish to enforce smoothness on the functional form of the HAL fit. This can be done using the smoothness_orders argument. Setting smoothness_orders = 0 gives a piece-wise constant fit (via zero-order basis functions), allowing for discontinuous jumps in the function. This is useful if one does not want to assume any smoothness or continuity of the “true” function. Setting smoothness_orders = 1 gives a piece-wise linear fit (via first-order basis functions), which is continuous and mostly differentiable. In general, smoothness_orders = k corresponds to a piece-wise polynomial fit of degree \(k\). Mathematically, smoothness_orders = k corresponds with finding the best fit under the constraint that the total variation of the function’s \(k^{\text{th}}\) derivative is bounded by some constant, which is selected with cross-validation.

Let’s see this in action.

Comparing the mean squared error (MSE) between the predictions and the true (denoised) outcome, the first- and second- order smoothed HAL is able to recover from the coarseness of the basis functions caused by the small num_knots argument. Also, the HAL with second-order smoothness is able to fit the true function very well (as expected, since sin(x) is a very smooth function). The main benefit of imposing higher-order smoothness is that fewer knot points are required for a near-optimal fit. Therefore, one can safely pass a smaller value to num_knots for a big decrease in runtime without sacrificing performance.

## [1] 0.00732315
## [1] 0.002432486
## [1] 0.001848927

In general, if the basis functions are not coarse, then the performance for different smoothness orders is similar. Notice how the runtime is a fair bit slower when more knot points are considered. In general, we recommend either zero- or first- order smoothness. Second-order smoothness tends to be less robust and suffers from extrapolation on new data. One can also use cross-validation to data-adaptively choose the optimal smoothness (invoked in fit_hal by setting adaptive_smoothing = TRUE). Comparing the following simulation and the previous one, the HAL with second-order smoothness performed better when there were fewer knot points.

## [1] 0.00732315
## [1] 0.002432486
## [1] 0.001834611

Formula interface

One might wish to specify the functional form of the HAL fit further. This can be done using the formula interface. Specifically, the formula interface allows one to specify monotonicity constraints on components of the HAL fit. It also allows one to specify exactly which basis functions (e.g., interactions) one wishes to model. The formula_hal function generates a formula object from a user-supplied character string, and this formula object contains the necessary specification information for fit_hal and glmnet. The formula_hal function is intended for use within fit_hal, and the user-supplied character string is inputted into fit_hal. Here, we call formula_hal directly for illustrative purposes.

We can specify an additive model in a number of ways.

The formula below includes the outcome, but formula_hal doesn’t fit a HAL model, and doesn’t need the outcome (actually everything before “\(\tilde\)” is ignored in formula_hal). This is why formula_hal takes the input X matrix of covariates, and not X and Y. In what follows, we include formulas with and without “y” in the character string.

## $cols
## [1] 1
## 
## $cutoffs
## [1] -3.971502
## 
## $orders
## [1] 0
## [1] TRUE

The . argument. We can generate an additive model for all or a subset of variables using the . variable and . argument of h. By default, . in h(.) is treated as a wildcard and basis functions are generated by replacing the . with all variables in X.

## [1] "x1" "x2" "A"
## [1] TRUE
## [1] TRUE

We can specify interactions as follows.

## [1] TRUE
## [1] TRUE
## [1] TRUE

Sometimes, one might want to build an additive model, but include all two-way interactions with one variable (e.g., treatment “A”). This can be done in a variety of ways. The . argument allows you to specify a subset of variables.

## [1] FALSE

A key feature of the HAL formula is monotonicity constraints. Specifying these constraints is achieved by specifying the monotone argument of h. Note if smoothness_orders = 0 then this is a monotonicity constrain on the function, but if if smoothness_orders = 1 then this is a monotonicity constraint on the function’s derivative (e.g. a convexity constraint). We can also specify that certain terms are not penalized in the LASSO/glmnet using the pf argument of h (stands for penalty factor).

The penalization feature can be used to reproduce glm

Now, that we’ve illustrated the options with formula_hal, let’s show how to fit a HAL model with the specified formula.

## 
## Summary of top 10 non-zero coefficients is based on lambda of 0.0005900481 
## 
##        coef                                term
##   0.7395438                         (Intercept)
##  -0.3026891   [ I(A >= -3.978)*(A - -3.978)^1 ]
##  -0.2948108 [ I(x2 >= -3.992)*(x2 - -3.992)^1 ]
##  -0.2763138   [ I(x1 >= 1.172)*(x1 - 1.172)^1 ]
##  -0.2493828   [ I(x2 >= 1.638)*(x2 - 1.638)^1 ]
##  -0.2475319 [ I(x1 >= -3.972)*(x1 - -3.972)^1 ]
##   0.2098793   [ I(x1 >= -1.45)*(x1 - -1.45)^1 ]
##   0.2079505   [ I(A >= -1.356)*(A - -1.356)^1 ]
##  -0.2075157     [ I(A >= 1.384)*(A - 1.384)^1 ]
##   0.2035278 [ I(x2 >= -1.293)*(x2 - -1.293)^1 ]

References

Benkeser, David, and Mark J van der Laan. 2016. “The Highly Adaptive Lasso Estimator.” In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE. https://doi.org/10.1109/dsaa.2016.93.

Coyle, Jeremy R, Nima S Hejazi, and Mark J van der Laan. n.d. hal9001: The Scalable Highly Adaptive Lasso (version 0.2.7). https://doi.org/10.5281/zenodo.3558313.

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1.

Hejazi, Nima S, Jeremy R Coyle, and Mark J van der Laan. 2020. “hal9001: Scalable Highly Adaptive Lasso Regression in R.” Journal of Open Source Software 5 (53): 2526. https://doi.org/10.21105/joss.02526.

van der Laan, Mark J. 2017. “Finite Sample Inference for Targeted Learning.” https://arxiv.org/abs/1708.09502.