This vignette exemplifies different ways to plot the specification curve. For most cases, the function plot_specs()
takes care of the overall process. However, more specific customization is possible if we use the more specific functions plot_curve()
and plot_choices
. Furthermore, we can extend the overall plot with the additional function plot_samplesizes()
. All of these functions produce objects of the class ggplot that can be customized further using the grammar of graphics provided by the package ggplot2.
In order to have some data to work with, we run the minimal example included in the package.
library(specr)
library(dplyr)
library(ggplot2)
library(cowplot)
# run spec analysis
results <- run_specs(example_data,
y = c("y1", "y2"),
x = c("x1", "x2"),
model = "lm",
controls = c("c1", "c2"),
subset = list(group1 = unique(example_data$group1),
group2 = unique(example_data$group2)))
Let’s quickly get some ideas about the specification curve by using summarise_specs()
## # A tibble: 1 x 7
## median mad min max q25 q75 obs
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3.59 4.56 -2.05 9.58 1.03 7.63 123
## # A tibble: 2 x 8
## x median mad min max q25 q75 obs
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 x1 3.59 4.42 -2.05 9.28 0.510 6.51 123
## 2 x2 4.40 4.84 -0.258 9.58 1.31 7.79 123
We see that it makes quite a difference whether x1
or x2
is used as independent variable.
The simplest way to visualize most of the information contained in the results data frame is by using the plot_specs()
function.
We can further customize that function, e.g., by removing unnecessary information (in this case we only specified one model, this analytical choice is hence useless) or by reordering/transforming the analytical choices (and thereby visualize specific contrasts).
The overall function plot_specs()
is not very flexible. Alternatively, we can plot the specification curve and the choice panel individually and bind them together afterwards. This is useful as it allows us to customize and change both individual plots.
# Plot specification curve
p1 <- plot_curve(results,
ci = FALSE,
ribbon = TRUE) +
geom_hline(yintercept = 0,
linetype = "dashed",
color = "black") +
ylim(-8, 12) +
labs(x = "", y = "unstandarized regression coefficient") +
theme_half_open()
# Plot choices
p2 <- plot_choices(results,
choices = c("x", "y", "controls", "subsets")) +
labs(x = "specifications (ranked)") +
theme_half_open() +
theme(strip.text.x = element_blank())
# Combine plots
plot_specs(plot_a = p1,
plot_b = p2,
labels = c("", ""), # remove plot labels
rel_height = c(2, 2.5)) # adjust relative heights
By default, we do not know how many participants were included in each specification. If you remove missing values listwise, this may not be a big problem as all models are based on the same subsample. If you have missing values in your dataset and you did not impute them or delete them listwise, we should investigate how many participants were included in each specification. The function plot_samplesizes()
provides an additional panel that can then be added to the overall plot using cowplot::plot_grid()