Mediation Analysis by Multiple Regression

Shu Fai Cheung & Sing-Hang Cheung

2022-09-08

1 Introduction

This article is a brief illustration of how to use indirect_effect() to estimate the indirect effects when the model parameters are estimated by ordinary least squares (OLS) multiple regression using lm().

2 Data Set and Model

This is the sample dataset used for illustration:

library(manymome)
dat <- data_med_complicated
print(round(head(dat), 2))
#>      x1   x2  m11  m12   m2   y1    y2   c1    c2
#> 1 10.16 4.00 5.84 6.78 7.11 6.06 10.38 5.01  8.20
#> 2 10.89 4.79 4.95 5.81 7.92 5.15  8.52 3.92  9.90
#> 3  9.99 5.79 4.95 4.47 8.21 3.25  7.81 5.91 11.36
#> 4 12.36 4.80 5.56 6.21 8.88 6.27  9.41 6.06 10.49
#> 5 10.85 6.39 6.19 5.39 8.76 5.36  9.84 4.28  9.81
#> 6 10.28 4.58 4.88 4.28 8.50 4.57 10.42 4.71 11.38

This dataset has 9 variables: 2 predictors (x1 and x2), three mediators (m11, m12, and m2), two outcome variables (y1 and y2), and two control variables (c1 and c2).

Suppose this is the model to be fitted:

Despite the apparent complexity, the path parameters can be estimated by five multiple regression models:

lm_m11 <- lm(m11 ~ x1 + x2 + c1 + c2, dat)
lm_m12 <- lm(m12 ~ m11 + x1 + x2 + c1 + c2, dat)
lm_m2 <- lm(m2 ~ x1 + x2 + c1 + c2, dat)
lm_y1 <- lm(y1 ~ m12 + m2 + m11 + x1 + x2 + c1 + c2, dat)
lm_y2 <- lm(y2 ~ m12 + m2 + m11 + x1 + x2 + c1 + c2, dat)

These are the regression coefficient estimates of the paths (those of control variables omitted):

#>        m11    m12    m2     y1     y2
#> x1   0.352 -0.212 0.022 -0.078  0.115
#> x2  -0.045 -0.072 0.289  0.003  0.062
#> m11         0.454        0.147  0.024
#> m12                      0.234  0.135
#> m2                      -0.433 -0.436

Although not mandatory, it is recommended to combine these five models into one object (a system of regression models) using lm2list():

fit_lm <- lm2list(lm_m11, lm_m12, lm_m2, lm_y1, lm_y2)
fit_lm
#> 
#> The models:
#> m11 ~ x1 + x2 + c1 + c2
#> m12 ~ m11 + x1 + x2 + c1 + c2
#> m2 ~ x1 + x2 + c1 + c2
#> y1 ~ m12 + m2 + m11 + x1 + x2 + c1 + c2
#> y2 ~ m12 + m2 + m11 + x1 + x2 + c1 + c2

Simply use the lm() outputs as arguments. Order does not matter. To ensure that the regression outputs can be validly combined, lm2list() will also check:

  1. whether the same sample is used in all regression analysis (not just same sample size, but the same set of cases), and

  2. whether the models are “connected”, to ensure that the regression outputs can be validly combined.

3 Generating Bootstrap Estimates

To form nonparametric bootstrap confidence interval for indirect effects to be computed, do_boot() can be used to generate bootstrap estimates for all regression coefficients first. These estimates can be reused for any indirect effects to be estimated.

boot_out_lm <- do_boot(fit_lm,
                       R = 100,
                       seed = 54532,
                       ncores = 1)

Please see vignette("do_boot") or the help page of do_boot() on how to use this function. In real research, R, the number of bootstrap samples, should be set to 2000 or even 5000. The argument ncores can usually be omitted unless users want to manually control the number of CPU cores used in parallel processing.

4 Indirect Effects

We can now use indirect_effect() to estimate the indirect effect and form its bootstrap confidence interval for any path in the model. By reusing the generated bootstrap estimates, there is no need to repeat the resampling.

Suppose we want to estimate the indirect effect from x1 to y1 through m11 and m12:

(Refer to vignette("manymome") and the help page of indirect_effect() on the arguments.)

out_x1m11m12y1 <- indirect_effect(x = "x1",
                                  y = "y1",
                                  m = c("m11", "m12"),
                                  fit = fit_lm,
                                  boot_ci = TRUE,
                                  boot_out = boot_out_lm)
out_x1m11m12y1
#> 
#> == Indirect Effect ==
#>                                            
#>  Path:               x1 -> m11 -> m12 -> y1
#>  Indirect Effect     0.037                 
#>  95.0% Bootstrap CI: [0.003 to 0.077]      
#> 
#> Computation Formula:
#>   (b.m11~x1)*(b.m12~m11)*(b.y1~m12)
#> Computation:
#>   (0.35204)*(0.45408)*(0.23402)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>     Path Coefficient
#>   m11~x1       0.352
#>  m12~m11       0.454
#>   y1~m12       0.234

The indirect effect is 0.037, with 95% confidence interval [0.003, 0.077].

Similarly, we can estimate the indirect effect from x2 to y2 through m2:

out_x2m2y2 <- indirect_effect(x = "x2",
                              y = "y2",
                              m = "m2",
                              fit = fit_lm,
                              boot_ci = TRUE,
                              boot_out = boot_out_lm)
out_x2m2y2
#> 
#> == Indirect Effect ==
#>                                        
#>  Path:               x2 -> m2 -> y2    
#>  Indirect Effect     -0.126            
#>  95.0% Bootstrap CI: [-0.233 to -0.043]
#> 
#> Computation Formula:
#>   (b.m2~x2)*(b.y2~m2)
#> Computation:
#>   (0.28901)*(-0.43598)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>   Path Coefficient
#>  m2~x2       0.289
#>  y2~m2      -0.436

The indirect effect is -0.126, with 95% confidence interval [-0.233, -0.043].

Note that any indirect path in the model can be estimated this way. Suppose, after doing the regression analysis, we want to estimate the indirect effect from x2 to m12 through m11, we just call indirect_effect():

out_x2m11m12 <- indirect_effect(x = "x2",
                                y = "m12",
                                m = "m11",
                                fit = fit_lm,
                                boot_ci = TRUE,
                                boot_out = boot_out_lm)
out_x2m11m12
#> 
#> == Indirect Effect ==
#>                                       
#>  Path:               x2 -> m11 -> m12 
#>  Indirect Effect     -0.020           
#>  95.0% Bootstrap CI: [-0.139 to 0.110]
#> 
#> Computation Formula:
#>   (b.m11~x2)*(b.m12~m11)
#> Computation:
#>   (-0.04471)*(0.45408)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>     Path Coefficient
#>   m11~x2     -0.0447
#>  m12~m11      0.4541

The indirect effect is -0.020, with 95% confidence interval [-0.139, 0.110].

There is no limit on the path to be estimated, as long as all required path coefficients are in the model. indirect_effect() will also check whether a path is valid. Therefore, estimating the effect from x1 to m2 through m11 will result in an error because this path does not exist in the model defined by the regression outputs.

5 Standardized Indirect effects

The standardized indirect effect from x1 to y1 through m11 and m12 can be estimated by setting standardized_x and standardized_y to `TRUE:

std_x1m11m12y1 <- indirect_effect(x = "x1",
                                  y = "y1",
                                  m = c("m11", "m12"),
                                  fit = fit_lm,
                                  boot_ci = TRUE,
                                  boot_out = boot_out_lm,
                                  standardized_x = TRUE,
                                  standardized_y = TRUE)
std_x1m11m12y1
#> 
#> == Indirect Effect ==
#>                                            
#>  Path:               x1 -> m11 -> m12 -> y1
#>  Indirect Effect     0.039                 
#>  95.0% Bootstrap CI: [0.004 to 0.085]      
#> 
#> Computation Formula:
#>   (b.m11~x1)*(b.m12~m11)*(b.y1~m12)*sd_x1/sd_y1
#> Computation:
#>   (0.35204)*(0.45408)*(0.23402)*(1.11605)/(1.06579)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>     Path Coefficient
#>   m11~x1       0.352
#>  m12~m11       0.454
#>   y1~m12       0.234
#> 
#> NOTE: The effects of the component paths are from the model, not standardized.

The standardized indirect effect is 0.039, with 95% confidence interval [0.004,0.085].

Similarly, we can estimate the standardized indirect effect from x1 to y1 through m2:

std_x1m2y1 <- indirect_effect(x = "x1",
                              y = "y1",
                              m = "m2",
                              fit = fit_lm,
                              boot_ci = TRUE,
                              boot_out = boot_out_lm,
                              standardized_x = TRUE,
                              standardized_y = TRUE)
std_x1m2y1
#> 
#> == Indirect Effect ==
#>                                       
#>  Path:               x1 -> m2 -> y1   
#>  Indirect Effect     -0.010           
#>  95.0% Bootstrap CI: [-0.069 to 0.067]
#> 
#> Computation Formula:
#>   (b.m2~x1)*(b.y1~m2)*sd_x1/sd_y1
#> Computation:
#>   (0.02233)*(-0.43300)*(1.11605)/(1.06579)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>   Path Coefficient
#>  m2~x1      0.0223
#>  y1~m2     -0.4330
#> 
#> NOTE: The effects of the component paths are from the model, not standardized.

The standardized indirect effect is -0.010, with 95% confidence interval [-0.069, 0.067].

6 Adding Effects

Note that the results of indirect_effect() can be added using +.

For example, to find the total indirect effect of x1 on y1, we need to compute the indirect effects along the following paths:

  1. x1 to m11 to m12 to y1
  2. x1 to m11 to y1
  3. x1 to m12 to y1
  4. x1 to m2 to y1

The indirect effects along Path a has already been computed. We compute the indirect effects along Paths b, c, and d below:

out_x1m11y1 <- indirect_effect(x = "x1",
                               y = "y1",
                               m = "m11",
                               fit = fit_lm,
                               boot_ci = TRUE,
                               boot_out = boot_out_lm)
out_x1m11y1
#> 
#> == Indirect Effect ==
#>                                       
#>  Path:               x1 -> m11 -> y1  
#>  Indirect Effect     0.052            
#>  95.0% Bootstrap CI: [-0.036 to 0.103]
#> 
#> Computation Formula:
#>   (b.m11~x1)*(b.y1~m11)
#> Computation:
#>   (0.35204)*(0.14694)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>    Path Coefficient
#>  m11~x1       0.352
#>  y1~m11       0.147
out_x1m12y1 <- indirect_effect(x = "x1",
                               y = "y1",
                               m = "m12",
                               fit = fit_lm,
                               boot_ci = TRUE,
                               boot_out = boot_out_lm)
out_x1m12y1
#> 
#> == Indirect Effect ==
#>                                        
#>  Path:               x1 -> m12 -> y1   
#>  Indirect Effect     -0.050            
#>  95.0% Bootstrap CI: [-0.125 to -0.000]
#> 
#> Computation Formula:
#>   (b.m12~x1)*(b.y1~m12)
#> Computation:
#>   (-0.21182)*(0.23402)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>    Path Coefficient
#>  m12~x1      -0.212
#>  y1~m12       0.234
out_x1m2y1 <- indirect_effect(x = "x1",
                               y = "y1",
                               m = "m2",
                               fit = fit_lm,
                               boot_ci = TRUE,
                               boot_out = boot_out_lm)
out_x1m2y1
#> 
#> == Indirect Effect ==
#>                                       
#>  Path:               x1 -> m2 -> y1   
#>  Indirect Effect     -0.010           
#>  95.0% Bootstrap CI: [-0.077 to 0.064]
#> 
#> Computation Formula:
#>   (b.m2~x1)*(b.y1~m2)
#> Computation:
#>   (0.02233)*(-0.43300)
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.
#> 
#> Coefficients of Component Paths:
#>   Path Coefficient
#>  m2~x1      0.0223
#>  y1~m2     -0.4330

We can now compute the total indirect effect:

out_x1y1_total <- out_x1m11m12y1 + out_x1m11y1 + out_x1m12y1 + out_x1m2y1
out_x1y1_total
#> 
#> == Indirect Effect ==
#>                                             
#>  Path:                x1 -> m11 -> m12 -> y1
#>  Path:                x1 -> m11 -> y1       
#>  Path:                x1 -> m12 -> y1       
#>  Path:                x1 -> m2 -> y1        
#>  Function of Effects: 0.030                 
#>  95.0% Bootstrap CI:  [-0.088 to 0.132]     
#> 
#> Computation of the Function of Effects:
#>  (((x1->m11->m12->y1)
#> +(x1->m11->y1))
#> +(x1->m12->y1))
#> +(x1->m2->y1) 
#> 
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.

The total effect of f1 on f4 is 0.030, with 95% confidence interval [-0.088, 0.132].

See help("math_indirect") for further details on addition for indirect_effect() outputs.

7 Differences in Effects

Subtraction can also be conducted using -. For example, we can compute the difference between the indirect effect of x1 on y1 through m11 and m12 and the indirect effect of x1 on y1 through m2:

out_x1_diff <- out_x1m11m12y1 - out_x1m2y1
out_x1_diff
#> 
#> == Indirect Effect ==
#>                                             
#>  Path:                x1 -> m11 -> m12 -> y1
#>  Path:                x1 -> m2 -> y1        
#>  Function of Effects: 0.047                 
#>  95.0% Bootstrap CI:  [-0.038 to 0.118]     
#> 
#> Computation of the Function of Effects:
#>  (x1->m11->m12->y1)
#> -(x1->m2->y1) 
#> 
#> 
#> Percentile confidence interval formed by nonparametric bootstrapping
#> with 100 bootstrap samples.

The difference in effects is 0.047, with 95% confidence interval [-0.038, 0.118].

See help("math_indirect") for further details on subtraction for indirect_effect() outputs.

8 Further Information

For further information on do_boot() and indirect_effect(), please refer to their help pages.