Pooling and Selection of Logistic Regression Models

Martijn W Heymans

2021-09-23

Introduction

With the psfmi package you can pool logistic regression models by using
the following pooling methods: RR (Rubin’s Rules), D1, D2, D3 and MPR (Median R Rule).

You can also use forward or backward selection from the pooled model.

This vignette show you examples of how to apply these procedures.

Examples

Pooling without BS and method D1

  library(psfmi)
  pool_lr <- psfmi_lr(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Chronic ~ Gender + Smoking + 
                      Function + JobControl + JobDemands + SocialSupport, 
                      method="D1")
  
  pool_lr$RR_model
#> $`Step 1 - no variables removed -`
#>            term    estimate  std.error    statistic        df     p.value
#> 1   (Intercept) -0.02145084 2.49485297 -0.008598036 104.09644 0.993156301
#> 2        Gender -0.35445151 0.41807427 -0.847819477 141.28927 0.397972465
#> 3       Smoking  0.07565036 0.34084592  0.221948835 147.74179 0.824660215
#> 4      Function -0.14188458 0.04337897 -3.270815252 132.02927 0.001368147
#> 5    JobControl  0.00690354 0.02053384  0.336203110  88.93815 0.737509628
#> 6    JobDemands  0.00227508 0.03872846  0.058744401 103.72259 0.953268722
#> 7 SocialSupport  0.04434046 0.05750883  0.771019941 126.70867 0.442130487
#>          OR   lower.EXP   upper.EXP
#> 1 0.9787776 0.006951596 137.8108760
#> 2 0.7015581 0.306989710   1.6032584
#> 3 1.0785854 0.549958398   2.1153353
#> 4 0.8677214 0.796369271   0.9454664
#> 5 1.0069274 0.966670925   1.0488604
#> 6 1.0022777 0.928182101   1.0822882
#> 7 1.0453382 0.932895897   1.1713332

Back to Examples

Pooling with BS and method D1

Pooling Logistic regression models over 5 imputed datasets with backward selection using a p-value of 0.05 and as method D1 and forcing the predictor “Smoking” in the models during backward selection.

  library(psfmi)
  pool_lr <- psfmi_lr(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Chronic ~ Gender + Smoking + 
                      Function + JobControl + JobDemands + SocialSupport, 
                      keep.predictors = "Smoking", method="D1", p.crit=0.05, 
                      direction="BW")
#> Removed at Step 1 is - JobDemands
#> Removed at Step 2 is - JobControl
#> Removed at Step 3 is - SocialSupport
#> Removed at Step 4 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  pool_lr$RR_model_final
#> $`Step 5`
#>          term    estimate  std.error  statistic       df    p.value        OR
#> 1 (Intercept)  1.20696975 0.48230894  2.5024827 138.9717 0.01349142 3.3433381
#> 2     Smoking  0.06427314 0.33804675  0.1901309 151.8086 0.84946055 1.0663836
#> 3    Function -0.14058914 0.04225212 -3.3273866 121.6357 0.00115993 0.8688462
#>   lower.EXP upper.EXP
#> 1 1.2883439 8.6761852
#> 2 0.5468337 2.0795610
#> 3 0.7991282 0.9446467
  pool_lr$multiparm_final
#> $`Step 5`
#>           p-values D1 F-statistic
#> Smoking  0.8492075865  0.03614977
#> Function 0.0009162863 11.07150131
  pool_lr$predictors_out
#>         Gender Smoking Function JobControl JobDemands SocialSupport
#> Step 1       0       0        0          0          1             0
#> Step 2       0       0        0          1          0             0
#> Step 3       0       0        0          0          0             1
#> Step 4       1       0        0          0          0             0
#> Removed      1       0        0          1          1             1

Back to Examples

Pooling with BS and method MPR

Pooling Logistic regression models over 5 imputed datasets with backward selection using a p-value of 0.05 and as method D1 and forcing the predictor “Smoking” in the models during backward selection.

  library(psfmi)
  pool_lr <- psfmi_lr(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Chronic ~ Gender + Smoking + 
                      Function + JobControl + JobDemands + SocialSupport, 
                      keep.predictors = "Smoking", method="MPR", p.crit=0.05, 
                      direction="BW")
#> Removed at Step 1 is - JobDemands
#> Removed at Step 2 is - JobControl
#> Removed at Step 3 is - SocialSupport
#> Removed at Step 4 is - Gender
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  pool_lr$RR_model_final
#> $`Step 5`
#>          term    estimate  std.error  statistic       df    p.value        OR
#> 1 (Intercept)  1.20696975 0.48230894  2.5024827 138.9717 0.01349142 3.3433381
#> 2     Smoking  0.06427314 0.33804675  0.1901309 151.8086 0.84946055 1.0663836
#> 3    Function -0.14058914 0.04225212 -3.3273866 121.6357 0.00115993 0.8688462
#>   lower.EXP upper.EXP
#> 1 1.2883439 8.6761852
#> 2 0.5468337 2.0795610
#> 3 0.7991282 0.9446467
  pool_lr$multiparm_final
#> $`Step 5`
#>           p-value MPR
#> Smoking  0.8734388519
#> Function 0.0003502948
  pool_lr$predictors_out  
#>         Gender Smoking Function JobControl JobDemands SocialSupport
#> Step 1       0       0        0          0          1             0
#> Step 2       0       0        0          1          0             0
#> Step 3       0       0        0          0          0             1
#> Step 4       1       0        0          0          0             0
#> Removed      1       0        0          1          1             1

Back to Examples

Pooling with BS including several interaction terms and method D2

Pooling Logistic regression models over 5 imputed datasets with BS using a p-value of 0.05 and as method D2. Several interaction terms, including a categorical predictor, are part of the selection procedure.

  library(psfmi)
  pool_lr <- psfmi_lr(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Chronic ~ Gender + Smoking + 
                        Function + JobControl + factor(Carrying) + 
                        factor(Satisfaction) +
                        factor(Carrying):Smoking + Gender:Smoking, 
                      method="D2", p.crit=0.05, 
                      direction="BW")
#> Removed at Step 1 is - JobControl
#> Removed at Step 2 is - factor(Satisfaction)
#> Removed at Step 3 is - Gender*Smoking
#> Removed at Step 4 is - Gender
#> Removed at Step 5 is - Smoking*factor(Carrying)
#> Removed at Step 6 is - Smoking
#> Removed at Step 7 is - Function
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  pool_lr$RR_model_final
#> $`Step 8`
#>                term  estimate std.error statistic       df      p.value
#> 1       (Intercept) -1.582393 0.3773100 -4.193880 151.2668 4.652717e-05
#> 2 factor(Carrying)2  1.391554 0.4709708  2.954650 144.1330 3.657304e-03
#> 3 factor(Carrying)3  2.248897 0.4750324  4.734198 151.4269 5.010441e-06
#>          OR  lower.EXP  upper.EXP
#> 1 0.2054828 0.09750312  0.4330445
#> 2 4.0210931 1.58510710 10.2006922
#> 3 9.4772792 3.70747175 24.2264342
  pool_lr$multiparm_final
#> $`Step 8`
#>                   p-values D2 F-statistic
#> factor(Carrying) 1.251841e-05    11.29016
  pool_lr$predictors_out 
#>         Gender Smoking Function JobControl factor(Carrying)
#> Step 1       0       0        0          1                0
#> Step 2       0       0        0          0                0
#> Step 3       0       0        0          0                0
#> Step 4       1       0        0          0                0
#> Step 5       0       0        0          0                0
#> Step 6       0       1        0          0                0
#> Step 7       0       0        1          0                0
#> Removed      1       1        1          1                0
#>         factor(Satisfaction) Smoking*factor(Carrying) Gender*Smoking
#> Step 1                     0                        0              0
#> Step 2                     1                        0              0
#> Step 3                     0                        0              1
#> Step 4                     0                        0              0
#> Step 5                     0                        1              0
#> Step 6                     0                        0              0
#> Step 7                     0                        0              0
#> Removed                    1                        1              1

Back to Examples

Pooling with BS and forcing interaction terms and method D1

Same as above but now forcing several predictors, including interaction terms, in the model during BS.

  library(psfmi)
  pool_lr <- psfmi_lr(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Chronic ~ Gender + Smoking + 
                      Function + JobControl + factor(Carrying) + factor(Satisfaction) +
                        factor(Carrying):Smoking + Gender:Smoking, 
                      keep.predictors = c("Smoking*Carrying", "JobControl"), method="D1", 
                      p.crit=0.05, direction="BW")
#> Removed at Step 1 is - Gender*Smoking
#> Removed at Step 2 is - Gender
#> Removed at Step 3 is - factor(Satisfaction)
#> Removed at Step 4 is - Function
  pool_lr$RR_model_final
#> $`Step 5`
#>                        term     estimate std.error  statistic        df
#> 1               (Intercept) -0.810522255 1.3650185 -0.5937812  55.33263
#> 2                   Smoking -1.796541680 1.1699026 -1.5356336  65.48650
#> 3                JobControl -0.004625312 0.0216596 -0.2135455  58.16427
#> 4         factor(Carrying)2  0.723452199 0.6214600  1.1641171 107.17663
#> 5         factor(Carrying)3  1.534813529 0.5908820  2.5974958 107.09227
#> 6 Smoking:factor(Carrying)2  2.093737680 1.3149790  1.5922214  66.01171
#> 7 Smoking:factor(Carrying)3  2.370029492 1.3934064  1.7008889  51.83267
#>      p.value         OR  lower.EXP  upper.EXP
#> 1 0.55507827  0.4446258 0.02884811   6.852861
#> 2 0.12944599  0.1658715 0.01604000   1.715297
#> 3 0.83164826  0.9953854 0.95315370   1.039488
#> 4 0.24696141  2.0615378 0.60139911   7.066751
#> 5 0.01071096  4.6404601 1.43831227  14.971624
#> 6 0.11611271  8.1151905 0.58760877 112.075110
#> 7 0.09495675 10.6977078 0.65294658 175.268475
  pool_lr$multiparm_final
#> $`Step 5`
#>                          p-values D1 F-statistic
#> Smoking                  0.313723002   1.1877289
#> JobControl               0.831291534   0.0456017
#> factor(Carrying)         0.000903194   4.7060389
#> Smoking*factor(Carrying) 0.188926654   1.6841757
  pool_lr$predictors_out 
#>         Gender Smoking Function JobControl factor(Carrying)
#> Step 1       0       0        0          0                0
#> Step 2       1       0        0          0                0
#> Step 3       0       0        0          0                0
#> Step 4       0       0        1          0                0
#> Removed      1       0        1          0                0
#>         factor(Satisfaction) Smoking*factor(Carrying) Gender*Smoking
#> Step 1                     0                        0              1
#> Step 2                     0                        0              0
#> Step 3                     1                        0              0
#> Step 4                     0                        0              0
#> Removed                    1                        0              1

Back to Examples

Pooling with BS including spline coefficient and method D1

Pooling Logistic regression models over 5 imputed datasets with BS using a p-value of 0.05 and as method D1. A spline predictor and interaction term are part of the selection procedure.

  library(psfmi)
  pool_lr <- psfmi_lr(data=lbpmilr, nimp=5, impvar="Impnr", 
                      formula = Chronic ~ Gender + Smoking + 
                      JobControl + factor(Carrying) + factor(Satisfaction) +
                      factor(Carrying):Smoking + rcs(Function, 3), 
                      method="D1", 
                      p.crit=0.05, direction="BW")
#> Removed at Step 1 is - JobControl
#> Removed at Step 2 is - Gender
#> Removed at Step 3 is - rcs(Function,3)
#> Removed at Step 4 is - factor(Satisfaction)
#> Removed at Step 5 is - Smoking*factor(Carrying)
#> Removed at Step 6 is - Smoking
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
  pool_lr$RR_model_final
#> $`Step 7`
#>                term  estimate std.error statistic       df      p.value
#> 1       (Intercept) -1.582393 0.3773100 -4.193880 151.2668 4.652717e-05
#> 2 factor(Carrying)2  1.391554 0.4709708  2.954650 144.1330 3.657304e-03
#> 3 factor(Carrying)3  2.248897 0.4750324  4.734198 151.4269 5.010441e-06
#>          OR  lower.EXP  upper.EXP
#> 1 0.2054828 0.09750312  0.4330445
#> 2 4.0210931 1.58510710 10.2006922
#> 3 9.4772792 3.70747175 24.2264342
  pool_lr$multiparm_final
#> $`Step 7`
#>                   p-values D1 F-statistic
#> factor(Carrying) 2.257078e-05     10.8085
  pool_lr$predictors_out 
#>         Gender Smoking JobControl factor(Carrying) factor(Satisfaction)
#> Step 1       0       0          1                0                    0
#> Step 2       1       0          0                0                    0
#> Step 3       0       0          0                0                    0
#> Step 4       0       0          0                0                    1
#> Step 5       0       0          0                0                    0
#> Step 6       0       1          0                0                    0
#> Removed      1       1          1                0                    1
#>         rcs(Function,3) Smoking*factor(Carrying)
#> Step 1                0                        0
#> Step 2                0                        0
#> Step 3                1                        0
#> Step 4                0                        0
#> Step 5                0                        1
#> Step 6                0                        0
#> Removed               1                        1

Back to Examples