I have made a lot of tables and, if I’m being honest, they are not much fun. The primary reason I wrote table.glue
was to help abstract away some of the really tedious aspects of dealing with tables, but there is also a basic system that table.glue
fits into to make tables far less painful. I have boiled that system down into three main ideas
Structuring data for a table
Making table values
Making an inline object
If you find yourself working on a manuscript with lots of tables, I think you will find this framework to be extremely helpful.
starwars_smry <- na.omit(starwars) %>%
filter(eye_color %in% c('blue', 'brown', 'hazel')) %>%
group_by(sex, eye_color) %>%
summarize(
across(
c(height, mass),
.fns = list(
lwr = ~quantile(.x, probs = 0.25),
est = ~quantile(.x, probs = 0.50),
upr = ~quantile(.x, probs = 0.75)
)
)
)
#> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
Tabulated data should provide both a point estimate and a measure of uncertainty that goes along with the estimate, e.g. a 95% confidence interval. table_glue()
is a nice function for putting together a point estimate and confidence interval. I use a magnitude based rounding specification to keep presentations consistent between height and mass.
rspec <- round_spec() %>%
round_half_even() %>%
round_using_magnitude(breaks = c(1, 10, 100, Inf),
digits = c(2, 1, 1, 0))
names(rspec) <- paste('table.glue', names(rspec), sep = '.')
options(rspec)
starwars_tbl <- starwars_smry %>%
transmute(
sex,
eye_color,
tbv_height = table_glue("{height_est} ({height_lwr} - {height_upr})"),
tbv_mass = table_glue("{mass_est} ({mass_lwr} - {mass_upr})")
)
Keeping your data in a tidy table format makes it fairly straightforward to make a publication ready gt
table. Most of the code is spent on re-naming things and labeling, which is how it should be.
starwars_tbl %>%
mutate(
sex = recode(sex,
'female' = 'Female characters',
'male' = 'Male characters'),
eye_color = recode(eye_color,
blue = 'Blue',
brown = 'Brown',
hazel = 'Hazel')
) %>%
gt(groupname_col = 'sex', rowname_col = 'eye_color') %>%
cols_label(tbv_height = 'Height, cm', tbv_mass = 'Mass, kg') %>%
cols_align('center') %>%
tab_stubhead(label = 'Eye color') %>%
tab_spanner(label = 'Median (25th, 75th percentile)',
columns = starts_with('tbv')) %>%
tab_source_note(md('For more on these data, see `?dplyr::starwars`'))
Eye color | Median (25th, 75th percentile) | |
---|---|---|
Height, cm | Mass, kg | |
Female characters | ||
Blue | 166 (166 - 168) | 56.2 (53.1 - 65.6) |
Brown | 158 (154 - 161) | 47.0 (46.0 - 48.0) |
Hazel | 178 (178 - 178) | 55.0 (55.0 - 55.0) |
Male characters | ||
Blue | 178 (175 - 188) | 84.0 (79.0 - 112) |
Brown | 183 (179 - 184) | 79.5 (78.8 - 81.0) |
Hazel | 170 (170 - 170) | 77.0 (77.0 - 77.0) |
For more on these data, see ?dplyr::starwars |
It isn’t always enough to make the table. Sometimes (okay, probably always), you need to interpret it too. If you are writing a report in Markdown, it may feel natural to use inline code to write out your interpretation, but it may also become tedious and discouraging for a few reasons:
You may need to copy and paste a lot of code to get the numbers you want
You may find it hard to maintain this code if you work with co-authors who suggest you report different numbers than the ones you wrote out in your draft.
Let’s take the approach where we access the table data directly first to highlight how this approach can get tedious. First I need to get the table values I am interested in, which is the heights of males and females
starwars_inline_female_brown_height <- starwars_tbl %>%
filter(sex == 'female', eye_color == 'brown') %>%
pull(tbv_height)
starwars_inline_female_blue_height <- starwars_tbl %>%
filter(sex == 'female', eye_color == 'blue') %>%
pull(tbv_height)
starwars_inline_female_hazel_height <- starwars_tbl %>%
filter(sex == 'female', eye_color == 'hazel') %>%
pull(tbv_height)
Now I have all I need to write my sentence:
“Female characters in Starwars with hazel eye color were, on average, taller than their counterparts with blue or brown eyes. The median (25th, 75th percentile) heights of these groups were 178 (178 - 178) for hazel, 166 (166 - 168) for blue, and 158 (154 - 161) for brown eyed female characters.”
The great thing about writing results inline is that it drastically reduces the chance of a copy/paste error, which can get your paper rejected before it even goes out to reviewers. However, this approach also has limitations:
It still relies pretty heavily on copying and pasting your code, and you could make a mistake there (I often do).
When you send this draft to your colleagues, one of them might suggest it would be more interesting to compare heights of male versus female characters with the same eye color, which means you will have to go back and create new inline objects.
Naming the inline objects will get extremely tedious in complicated tables. For the relatively simple table above, you can see we have to use quite a lot of text to name things clearly.
Keeping your table data in a nested list will solve almost all of the limitations above (nothing will solve colleagues who like to change your paper, sorry).
A list can store all of your table data programmatically, so you never to copy/paste code.
Hierarchical lists support names for each nested list, so your inline object will actually name itself!
table.glue
provides a function to make your inline object. Assuming you have put your data into a tidy table format, this is all you need.
starwars_inline <- starwars_tbl %>%
as_inline(tbl_variables = c("sex", "eye_color"),
tbl_values = c("tbv_height", "tbv_mass"))
print(starwars_inline)
#> $female
#> $female$blue
#> $female$blue$tbv_height
#> [1] "166 (166 - 168)"
#>
#> $female$blue$tbv_mass
#> [1] "56.2 (53.1 - 65.6)"
#>
#>
#> $female$brown
#> $female$brown$tbv_height
#> [1] "158 (154 - 161)"
#>
#> $female$brown$tbv_mass
#> [1] "47.0 (46.0 - 48.0)"
#>
#>
#> $female$hazel
#> $female$hazel$tbv_height
#> [1] "178 (178 - 178)"
#>
#> $female$hazel$tbv_mass
#> [1] "55.0 (55.0 - 55.0)"
#>
#>
#>
#> $male
#> $male$blue
#> $male$blue$tbv_height
#> [1] "178 (175 - 188)"
#>
#> $male$blue$tbv_mass
#> [1] "84.0 (79.0 - 112)"
#>
#>
#> $male$brown
#> $male$brown$tbv_height
#> [1] "183 (179 - 184)"
#>
#> $male$brown$tbv_mass
#> [1] "79.5 (78.8 - 81.0)"
#>
#>
#> $male$hazel
#> $male$hazel$tbv_height
#> [1] "170 (170 - 170)"
#>
#> $male$hazel$tbv_mass
#> [1] "77.0 (77.0 - 77.0)"
Some nice things about this:
We have all the table data in one place, so updating our text will only require updating the text and won’t require any changes to the code.
The names of the list are generated using categories of table variables, so the inline object is clear and easy to navigate (especially if you are using tab-completion of names in Rstudio!)
Lists can easily be concatenated, so you could take two or more tables and make them into one inline object. (see below)
If you have twenty tables, you may prefer to keep their data in one (instead of twenty) inline objects. For the sake of keeping this brief, let’s just duplicate the first inline object and then show how concatenation would work:
Now the concatenation: