Average Treatment Effects

Klaus Kähler Holst

2021-10-25

Introduction

Let \(Y\) be a binary response, \(A\) a categorical treatment, and \(W\) a vector of confounders. Assume that we have observed \(n\) i.i.d. observations \((Y_i, W_i, A_i) \sim P, i=1,\ldots,n\). In the following we are interested in estimating the target parameter \(\psi(P) = (E[Y(a)]\), where \(Y(a)\) is the potential outcome we would have observed if treatment \(a\) had been administered, possibly contrary to the actual treatment that was observed, i.e., \(Y = Y(A)\).

Under the following assumptions

  1. Stable Unit Treatment Values Assumption (the treatment of a specific subject is not affecting the potential outcome of other subjects)
  2. Positivity, \(P(A\mid W)>\epsilon\) for some \(\epsilon>0.\)
  3. No unmeasured confounders, \(Y(a)\perp \!\!\! \perp A|W\)

the target parameter can be identified from the observed data distribution as \[E(E[Y|W,A=a]) = E(E[Y(a)|W]) = E[Y(a)]\] or \[E[Y I(A=a)/P(A=a|W)] = E[Y(a)].\]

This suggests estimation via either outcome regression (OR, g-computation) or inverse probability weighting (IPW). These can eventually also be combined to a doubly-robust augmented inverse probability weighted (AIPW) estimator.

Simulation

As an illustration we simulate from the following model \[Y \sim Bernoulli(\operatorname{expit}\{A+X+Z\})\] \[A \sim Bernoulli(\operatorname{expit}\{X+Z\})\] \[Z \sim \mathcal{N}(X,1)\]

m <- lvm(Y ~ A+X+Z, A~X+Z, Z~X)
m <- distribution(m, ~A+Y, binomial.lvm())
d <- sim(m, 1e3, seed=1)
head(d)
#>   Y A          X          Z
#> 1 0 0 -0.6264538  0.5085113
#> 2 1 1  0.1836433  1.2955752
#> 3 0 0 -0.8356286 -1.7064062
#> 4 1 1  1.5952808  1.8060124
#> 5 1 0  0.3295078  0.3989034
#> 6 0 0 -0.8204684 -2.4831172

Estimation

The target parameter, \(E[Y(a)]\) can be estimated with the targeted::ate function:

args(ate)
#> function (formula, data = parent.frame(), weights, binary = TRUE, 
#>     nuisance = NULL, propensity = nuisance, all, missing = FALSE, 
#>     labels = NULL, ...) 
#> NULL

The formula should be specified as formula = response ~ treatment, and the outcome regression specified as nuisance = ~ covariates, and propensity model propensity = ~ covariates. Alternatively, the formula can be specified with the notation formula = response ~ treatment | OR-covariates | propensity-covariates. Parametric models are assumed for both the outcome regression and the propensity model which defaults to be logistic regression models (as in the simulation). A linear model can be used for the outcome regression part by setting binary=FALSE.

To illustrate this, we can estimate the (non-causal) marginal mean of each treatment value

ate(Y ~ A, data=d, nuisance=~1, propensity=~1)
#>     Estimate Std.Err   2.5%  97.5%   P-value
#> A=0   0.2937  0.0264 0.2419 0.3454 9.852e-29
#> A=1   0.8367  0.0166 0.8042 0.8692 0.000e+00

or equivalently

ate(Y ~ A | 1 | 1, data=d)
#>     Estimate Std.Err   2.5%  97.5%   P-value
#> A=0   0.2937  0.0264 0.2419 0.3454 9.852e-29
#> A=1   0.8367  0.0166 0.8042 0.8692 0.000e+00

In this simulation we can compare the estimates to the actual expected potential outcomes which can be approximated by Monte Carlo integration by simulating from the model where we intervene on \(A\) (i.e., break the dependence on \(X, Z\)):

mean(sim(intervention(m, "A", 0), 2e5)$Y)
#> [1] 0.501065
mean(sim(intervention(m, "A", 1), 2e5)$Y)
#> [1] 0.63743

The IPW estimator can then be estimated

ate(Y ~ A | 1 | X+Z, data=d)
#>     Estimate Std.Err   2.5%  97.5%   P-value
#> A=0   0.4793 0.06056 0.3606 0.5980 2.474e-15
#> A=1   0.6669 0.03263 0.6029 0.7308 7.528e-93

Similarly, the OR estimator is obtained by

ate(Y ~ A | A*(X+Z) | 1, data=d)
#>     Estimate Std.Err   2.5%  97.5%    P-value
#> A=0   0.5115 0.02106 0.4703 0.5528 2.367e-130
#> A=1   0.6585 0.02518 0.6092 0.7079 1.008e-150

Both estimates are in this case consistent though we can see that the OR estimate is much more efficient compared to the IPW estimator. However, both of these models rely on correct model specifications.

ate(Y ~ A | 1 | X, data=d)
#>     Estimate Std.Err   2.5%  97.5%   P-value
#> A=0   0.4208 0.03623 0.3498 0.4918 3.391e-31
#> A=1   0.7072 0.03366 0.6412 0.7731 5.204e-98
ate(Y ~ A | A*X | 1, data=d)
#>     Estimate Std.Err   2.5%  97.5%   P-value
#> A=0   0.4413 0.02395 0.3944 0.4882  7.88e-76
#> A=1   0.7084 0.02491 0.6596 0.7572 5.84e-178

In contrast, the doubly-robust AIPW estimator is consistent in the intersection model where either the propensity model or the outcome regression model is correctly specified

a <- ate(Y ~ A | A*X | X+Z, data=d)
summary(a)
#> 
#> Augmented Inverse Probability Weighting estimator
#>   Response Y (Outcome model: logistic regression):
#>   Y ~ A * X 
#>   Exposure A (Propensity model: logistic regression):
#>   A ~ X + Z 
#> 
#>                    Estimate Std.Err    2.5%    97.5%    P-value
#>  A=0               0.493714 0.03582  0.4235  0.56391  3.153e-43
#>  A=1               0.654050 0.02839  0.5984  0.70970 1.995e-117
#> Outcome model:                                                 
#>  (Intercept)      -0.320268 0.12263 -0.5606 -0.07991  9.012e-03
#>  A                 1.578135 0.18083  1.2237  1.93255  2.609e-18
#>  X                 1.464900 0.15842  1.1544  1.77539  2.305e-20
#>  A:X              -0.010495 0.25101 -0.5025  0.48147  9.666e-01
#> Propensity model:                                              
#>  (Intercept)      -0.006421 0.16345 -0.3268  0.31393  9.687e-01
#>  X                -0.863004 0.30452 -1.4599 -0.26615  4.598e-03
#>  Z                -1.005211 0.30499 -1.6030 -0.40744  9.811e-04
#> 
#> Average Treatment Effect (constrast: 'A=0' vs. 'A=1'):
#> 
#>    Estimate Std.Err    2.5%    97.5%   P-value
#> RR   0.7549 0.06201  0.6333  0.87640 4.353e-34
#> OR   0.5158 0.09511  0.3294  0.70222 5.860e-08
#> RD  -0.1603 0.04428 -0.2471 -0.07356 2.931e-04

From the summary output we also get the estimates of the Average Treatment Effects expressed as a causal relative risk (RR), causal odds ratio (OR), or causal risk difference (RD) including the confidence limits.

From the model object a we can extract the estimated coefficients (expected potential outcomes) and corresponding asympotic variance matrix with the coef and vcov methods. The estimated influence function can extracted with the iid method:

coef(a)
#>       A=0       A=1 
#> 0.4937138 0.6540498
vcov(a)
#>              A=0          A=1
#> A=0 1.282802e-03 6.426918e-05
#> A=1 6.426918e-05 8.060789e-04
head(iid(a))
#>                A=0           A=1
#> [1,] -0.0002716694 -6.835682e-05
#> [2,] -0.0009673112  2.750537e-04
#> [3,] -0.0010063185 -2.792536e-04
#> [4,] -0.0006965647  2.878354e-04
#> [5,]  0.0017159609  1.537040e-04
#> [6,] -0.0015263986 -2.268427e-04

Multiple treatments

As an example with multiple treatment levels, we simulate from a new model where the outcome is continuous and the treatment follows a proportional odds model

m <- lvm(y ~ a+x, a~x)
m <- ordinal(m, K=4, ~a)
d <- transform(sim(m, 1e4), a=factor(a))

The AIPW estimator is obtained by estimating a logistic regression model for each treatment level (vs all others) in the propensity model (here a correct model is specified for both the OR and IPW part)

summary(a2 <- ate(y ~ a | a*x | x, data=d, binary=FALSE))
#> 
#> Augmented Inverse Probability Weighting estimator
#>   Response y (Outcome model: linear regression):
#>   y ~ a * x 
#>   Exposure a (Propensity model: logistic regression):
#>   a ~ x 
#> 
#>     Estimate Std.Err    2.5%   97.5%   P-value
#> a=0  -1.6103 0.13569 -1.8762 -1.3444 1.740e-32
#> a=1  -0.8223 0.04070 -0.9021 -0.7426 8.783e-91
#> a=2  -0.4258 0.03512 -0.4946 -0.3570 7.909e-34
#> a=3   0.7492 0.05391  0.6436  0.8549 6.484e-44
#> 
#> Average Treatment Effect (constrast: 'a=0' vs. 'a=1'):
#> 
#>     Estimate Std.Err   2.5%   97.5%   P-value
#> ATE   -0.788  0.1369 -1.056 -0.5197 8.591e-09

Choosing a different contrast for the association measures:

summary(a2, contrast=c(2,4))
#> 
#> Augmented Inverse Probability Weighting estimator
#>   Response y (Outcome model: linear regression):
#>   y ~ a * x 
#>   Exposure a (Propensity model: logistic regression):
#>   a ~ x 
#> 
#>     Estimate Std.Err    2.5%   97.5%   P-value
#> a=0  -1.6103 0.13569 -1.8762 -1.3444 1.740e-32
#> a=1  -0.8223 0.04070 -0.9021 -0.7426 8.783e-91
#> a=2  -0.4258 0.03512 -0.4946 -0.3570 7.909e-34
#> a=3   0.7492 0.05391  0.6436  0.8549 6.484e-44
#> 
#> Average Treatment Effect (constrast: 'a=1' vs. 'a=3'):
#> 
#>     Estimate Std.Err   2.5%  97.5%    P-value
#> ATE   -1.572 0.06163 -1.692 -1.451 1.947e-143
head(iid(a2))
#>               a=0           a=1           a=2           a=3
#> [1,] 1.699885e-04 -1.925939e-05 -2.047830e-05 -6.118097e-05
#> [2,] 4.385390e-04 -2.168497e-05 -3.800845e-04 -1.615337e-04
#> [3,] 2.247419e-03  8.783938e-05  1.301313e-04  7.277154e-04
#> [4,] 8.039927e-04 -1.337514e-05 -1.349754e-05 -4.522864e-05
#> [5,] 9.853475e-05 -5.410753e-05  1.656541e-03 -3.895242e-04
#> [6,] 6.935215e-04  1.005663e-03  4.839236e-05  1.215091e-04
estimate(a2, function(x) x[2]-x[4])
#>     Estimate Std.Err   2.5%  97.5%    P-value
#> a=1   -1.572 0.06163 -1.692 -1.451 1.947e-143

SessionInfo

sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-apple-darwin20.6.0 (64-bit)
#> Running under: macOS Big Sur 11.6
#> 
#> Matrix products: default
#> BLAS:   /usr/local/Cellar/openblas/0.3.18/lib/libopenblasp-r0.3.18.dylib
#> LAPACK: /usr/local/Cellar/r/4.1.1_1/lib/R/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] targeted_0.2.0 lava_1.6.11   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7           bslib_0.2.5.1        compiler_4.1.1      
#>  [4] formatR_1.11         jquerylib_0.1.4      futile.logger_1.4.3 
#>  [7] mets_1.2.9           futile.options_1.0.1 tools_4.1.1         
#> [10] digest_0.6.27        jsonlite_1.7.2       evaluate_0.14       
#> [13] lattice_0.20-44      rlang_0.4.11         Matrix_1.3-4        
#> [16] yaml_2.2.1           parallel_4.1.1       mvtnorm_1.1-2       
#> [19] xfun_0.25            fastmap_1.1.0        stringr_1.4.0       
#> [22] knitr_1.33           sass_0.4.0           globals_0.14.0      
#> [25] grid_4.1.1           data.table_1.14.0    listenv_0.8.0       
#> [28] R6_2.5.1             timereg_2.0.0        future.apply_1.8.1  
#> [31] parallelly_1.27.0    survival_3.2-11      rmarkdown_2.10      
#> [34] lambda.r_1.2.4       magrittr_2.0.1       codetools_0.2-18    
#> [37] htmltools_0.5.2      splines_4.1.1        future_1.22.1       
#> [40] numDeriv_2016.8-1.1  optimx_2021-6.12     stringi_1.7.4