ggmice
The ggmice
package provides plotting functions for the evaluation of incomplete data, mice
imputation models, and multiply imputed data sets (mice::mids
). The functions in ggmice
adhere to the ‘grammar of graphics’ philosophy, popularized by the ggplot2
package. With that, ggmice
enhances imputation workflows and provides plotting objects that are easily extended and manipulated by each individual ‘imputer’.
This vignette gives an overview of the core plotting functions in ggmice
. Experienced mice
users may already be familiar with the lattice
style plotting functions in mice
. These ‘old friends’ such as mice::bwplot()
can be re-created with ggmice
, see this vignette for advise.
The ggmice
package can be installed from GitHub as follows:
install.packages("devtools")
::install_github("amices/ggmice") devtools
In this vignette we’ll use ggmice
in combination with the imputation package mice
and the plotting package ggplot2
. It is recommended to load mice
and ggplot2
into your workspace as well, but in your own workflow you could choose to call their functions directly instead (e.g., mice::mice()
or ggplot2::aes()
). In this vignette, we assume that all three packages are loaded, as well as an incomplete and imputed version of the mice::boys
dataset.
# load packages
library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(ggplot2)
library(ggmice)
#>
#> Attaching package: 'ggmice'
#> The following objects are masked from 'package:mice':
#>
#> bwplot, densityplot, stripplot, xyplot
# load incomplete dataset
<- boys
dat
# generate imputations
<- mice(dat, method = "pmm", printFlag = FALSE) imp
The ggmice
package contains functions to explore incomplete data.
The plot_pattern()
function displays the missing data pattern in an incomplete dataset. The argument data
(the incomplete dataset) is required, the argument square
is optional and determines whether the missing data pattern has square or rectangular tiles, and the optional argument rotate
changes the angle of the variable names 90 degrees if requested.
# create missing data pattern plot
plot_pattern(dat)
# specify optional arguments
plot_pattern(dat, square = TRUE, rotate = TRUE)
The plot_flux()
function produces an influx-outflux plot. The influx of a variable quantifies how well its missing data connect to the observed data on other variables. The outflux of a variable quantifies how well its observed data connect to the missing data on other variables. In general, higher influx and outflux values are preferred when building imputation models. The plotting function requires an incomplete dataset (argument data
), and takes optional arguments to adjust the legend and axis labels.
# create influx-outflux plot
plot_flux(dat)
# specify optional arguments
plot_flux(
dat, label = FALSE,
caption = FALSE
)
The function plot_corr()
can be used to investigate relations between variables, for the development of imputation models. Only one of the arguments (data
, the incomplete dataset) is required, all other arguments are optional.
# create correlation plot
plot_corr(dat)
# specify optional arguments
plot_corr(
dat,vrb = c("hgt", "wgt", "bmi"),
label = TRUE,
square = FALSE,
diagonal = TRUE
)
The function plot_pred()
displays mice
predictor matrices. A predictor matrix is typically created using mice::make.predictorMatrix()
, mice::quickpred()
, or by using the default in mice::mice()
and extracting the predictorMatrix
from the resulting mids
object. The plot_pred()
function requires a predictor matrix (the data
argument), but other arguments can be provided too.
# create predictor matrix
<- quickpred(dat)
pred
# create predictor matrix plot
plot_pred(pred)
# specify optional arguments
plot_pred(
pred, label = TRUE,
square = FALSE
)
ggmice()
functionThe ggmice
function processes incomplete data in such a way that it can be displayed with ggplot2
. The missing values are displayed on the axes (i.e., a missing value for the x-variable is plotted on top of the y-axis, and vice versa). Note that, in contrast to the ggplot()
function, ggmice()
requires an aesthetic mapping (argument mapping
).
# create scatter plot with continuous variables
ggmice(dat, aes(age, bmi)) +
geom_point()
# create scatter plot with a categorical variable
ggmice(dat, aes(gen, bmi)) +
geom_point()
The ggmice
package contains two functions to evaluate observed and imputed data.
The function plot_trace()
plots the trace lines of the MICE algorithm for convergence evaluation. The only required argument is data
(to supply a mice::mids
object). The optional argument vrb
defaults to "all"
, which would display traceplots for all variables.
# create traceplot for one variable
plot_trace(imp, "bmi")
ggmice()
functionThe ggmice
function is versatile. It produces a ggplot
object that can be extended to mimic every type of plot for observed and imputed data in mice
, see see this vignette for advise. Below are some examples of plots produced with ggmice()
. Note that, in contrast to the ggplot()
function, ggmice()
requires an aesthetic mapping (argument mapping
).
# create scatter plot with continuous variables
ggmice(imp, aes(age, bmi)) +
geom_point()
# create scatter plot with a categorical variable
ggmice(imp, aes(gen, bmi)) +
geom_point()
# create scatter plot with a transformed variable
ggmice(imp, aes(log(wgt), hgt)) +
geom_point()
# create stripplot with boxplot overlay
ggmice(imp, aes(x = .imp, y = bmi)) +
geom_jitter(height = 0) +
geom_boxplot(fill = "white", alpha = 0.75, outlier.shape = NA) +
labs(x = "Imputation number")
# this vignette was generated with R session
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19042)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=Dutch_Netherlands.1252
#> [3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
#> [5] LC_TIME=Dutch_Netherlands.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggmice_0.0.1 ggplot2_3.3.5.9000 mice_3.14.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8.2 highr_0.9 bslib_0.3.1 compiler_4.1.2
#> [5] pillar_1.7.0 jquerylib_0.1.4 tools_4.1.2 digest_0.6.29
#> [9] gtable_0.3.0 lattice_0.20-45 jsonlite_1.8.0 evaluate_0.15
#> [13] lifecycle_1.0.1 tibble_3.1.6 pkgconfig_2.0.3 rlang_1.0.2
#> [17] cli_3.2.0 DBI_1.1.2 rstudioapi_0.13 yaml_2.3.5
#> [21] xfun_0.30 fastmap_1.1.0 withr_2.5.0 dplyr_1.0.8
#> [25] stringr_1.4.0 knitr_1.37 generics_0.1.2 vctrs_0.3.8
#> [29] sass_0.4.0 grid_4.1.2 tidyselect_1.1.2 glue_1.6.2
#> [33] R6_2.5.1 fansi_1.0.2 rmarkdown_2.13 farver_2.1.0
#> [37] tidyr_1.2.0 purrr_0.3.4 magrittr_2.0.2 scales_1.1.1
#> [41] backports_1.4.1 htmltools_0.5.2 ellipsis_0.3.2 assertthat_0.2.1
#> [45] colorspace_2.0-3 labeling_0.4.2 utf8_1.2.2 stringi_1.7.6
#> [49] munsell_0.5.0 broom_0.7.12 crayon_1.5.0