The goal of CausalModels is to provide a survey of fundamental causal inference models in one single location. While there are many packages for these types of models, CausalModels brings them all to one place with a simple user experience.
You can install the development version of CausalModels from GitHub with:
# install.packages("devtools")
::install_github("ander428/CausalModels") devtools
This is a basic example which shows you how to solve a common problem:
library(CausalModels)
library(causaldata)
data(nhefs)
<- nhefs[which(!is.na(nhefs$wt82)),]
nhefs.nmv $qsmk <- as.factor(nhefs.nmv$qsmk)
nhefs.nmv
<- c("sex", "race", "age", "education", "smokeintensity",
confounders "smokeyrs", "exercise", "active", "wt71")
# initialize package
?init_paramsinit_params(wt82_71, qsmk,
covariates = confounders,
data = nhefs.nmv, simple = F)
#> Successfully initialized!
#>
#> Summary:
#>
#> Outcome - wt82_71
#> Treatment - qsmk
#> Covariates - [ sex, race, age, education, smokeintensity, smokeyrs, exercise, active, wt71 ]
#>
#> Size - 1566 x 67
#>
#> Default formula for outcome models:
#> wt82_71 ~ qsmk + sex + race + education + exercise + active + age + (qsmk * age) + I(age * age) + smokeintensity + (qsmk * smokeintensity) + I(smokeintensity * smokeintensity) + smokeyrs + (qsmk * smokeyrs) + I(smokeyrs * smokeyrs) + wt71 + (qsmk * wt71) + I(wt71 * wt71)
#>
#> Default formula for propensity models:
#> qsmk ~ sex + race + education + exercise + active + age + I(age * age) + smokeintensity + I(smokeintensity * smokeintensity) + smokeyrs + I(smokeyrs * smokeyrs) + wt71 + I(wt71 * wt71)
# mode the causal effect of qsmk on wt82_71
<- standardization(nhefs.nmv)
model print(model)
#>
#> Call: glm(formula = wt82_71 ~ qsmk + sex + race + education + exercise +
#> active + age + (qsmk * age) + I(age * age) + smokeintensity +
#> (qsmk * smokeintensity) + I(smokeintensity * smokeintensity) +
#> smokeyrs + (qsmk * smokeyrs) + I(smokeyrs * smokeyrs) + wt71 +
#> (qsmk * wt71) + I(wt71 * wt71), family = family, data = combined_data)
#>
#> Coefficients:
#> (Intercept) qsmk1
#> -0.9699812 0.5509460
#> sex1 race1
#> -1.4371844 0.5868376
#> education2 education3
#> 0.8174769 0.5824119
#> education4 education5
#> 1.5240890 -0.1792422
#> exercise1 exercise2
#> 0.3063727 0.3550789
#> active1 active2
#> -0.9460683 -0.2707615
#> age I(age * age)
#> 0.3495673 -0.0060652
#> smokeintensity I(smokeintensity * smokeintensity)
#> 0.0482197 -0.0009597
#> smokeyrs I(smokeyrs * smokeyrs)
#> 0.1418662 -0.0018076
#> wt71 I(wt71 * wt71)
#> 0.0393011 -0.0009787
#> qsmk1:age qsmk1:smokeintensity
#> 0.0123138 0.0448028
#> qsmk1:smokeyrs qsmk1:wt71
#> -0.0235529 0.0291350
#>
#> Degrees of Freedom: 1565 Total (i.e. Null); 1542 Residual
#> (3132 observations deleted due to missingness)
#> Null Deviance: 97180
#> Residual Deviance: 82690 AIC: 10710
#>
#> Average treatment effect of qsmk:
#> 3.4927
summary(model)
#>
#> Call:
#> glm(formula = wt82_71 ~ qsmk + sex + race + education + exercise +
#> active + age + (qsmk * age) + I(age * age) + smokeintensity +
#> (qsmk * smokeintensity) + I(smokeintensity * smokeintensity) +
#> smokeyrs + (qsmk * smokeyrs) + I(smokeyrs * smokeyrs) + wt71 +
#> (qsmk * wt71) + I(wt71 * wt71), family = family, data = combined_data)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -41.913 -4.168 -0.314 3.869 44.573
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.9699812 4.3673208 -0.222 0.824266
#> qsmk1 0.5509460 2.8229123 0.195 0.845286
#> sex1 -1.4371844 0.4693195 -3.062 0.002235 **
#> race1 0.5868376 0.5828368 1.007 0.314158
#> education2 0.8174769 0.6085125 1.343 0.179339
#> education3 0.5824119 0.5575569 1.045 0.296382
#> education4 1.5240890 0.8351981 1.825 0.068221 .
#> education5 -0.1792422 0.7462118 -0.240 0.810205
#> exercise1 0.3063727 0.5360193 0.572 0.567697
#> exercise2 0.3550789 0.5592886 0.635 0.525603
#> active1 -0.9460683 0.4105673 -2.304 0.021338 *
#> active2 -0.2707615 0.6851128 -0.395 0.692745
#> age 0.3495673 0.1648157 2.121 0.034084 *
#> I(age * age) -0.0060652 0.0017347 -3.496 0.000485 ***
#> smokeintensity 0.0482197 0.0518339 0.930 0.352375
#> I(smokeintensity * smokeintensity) -0.0009597 0.0009409 -1.020 0.307878
#> smokeyrs 0.1418662 0.0943836 1.503 0.133023
#> I(smokeyrs * smokeyrs) -0.0018076 0.0015458 -1.169 0.242437
#> wt71 0.0393011 0.0836422 0.470 0.638514
#> I(wt71 * wt71) -0.0009787 0.0005255 -1.862 0.062751 .
#> qsmk1:age 0.0123138 0.0670159 0.184 0.854238
#> qsmk1:smokeintensity 0.0448028 0.0360169 1.244 0.213712
#> qsmk1:smokeyrs -0.0235529 0.0654333 -0.360 0.718931
#> qsmk1:wt71 0.0291350 0.0276439 1.054 0.292075
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 53.62833)
#>
#> Null deviance: 97176 on 1565 degrees of freedom
#> Residual deviance: 82695 on 1542 degrees of freedom
#> (3132 observations deleted due to missingness)
#> AIC: 10706
#>
#> Number of Fisher Scoring iterations: 2
#>
#> Average treatment effect of qsmk:
#> Estimate
#> Observed effect 2.638300
#> Counterfactual (treated) 5.243703
#> Counterfactual (untreated) 1.751003
#> Risk difference 3.492700
#> Risk ratio 2.994686
#>