Tutorial: tbl_regression

Last Updated: September 13, 2020

Introduction

The tbl_regression() function takes a regression model object in R and returns a formatted table of regression model results that is publication-ready. It is a simple way to summarize and present your analysis results using R! Like tbl_summary(), tbl_regression() creates highly customizable analytic tables with sensible defaults.

This vignette will walk a reader through the tbl_regression() function, and the various functions available to modify and make additions to an existing formatted regression table.

animated

Behind the scenes: tbl_regression() uses broom::tidy() to perform the initial model formatting, and can accommodate many different model types (e.g. lm(), glm(), survival::coxph(), survival::survreg() and other are vetted models known to work with {gtsummary}). It is also possible to specify your own function to tidy the model results if needed.

Setup

Before going through the tutorial, install and load {gtsummary}.

# install.packages("gtsummary")
library(gtsummary)

Example data set

In this vignette we’ll be using the trial data set which is included in the {gtsummary package}.

Variable Class Label

trt

character Chemotherapy Treatment

age

numeric Age

marker

numeric Marker Level (ng/mL)

stage

factor T Stage

grade

factor Grade

response

integer Tumor Response

death

integer Patient Died

ttdeath

numeric Months to Death/Censor
Includes mix of continuous, dichotomous, and categorical variables

Basic Usage

The default output from tbl_regression() is meant to be publication ready.

# build logistic regression model
m1 <- glm(response ~ age + stage, trial, family = binomial)

# view raw model results
summary(m1)$coefficients
#>                Estimate Std. Error    z value   Pr(>|z|)
#> (Intercept) -1.48622424 0.62022844 -2.3962530 0.01656365
#> age          0.01939109 0.01146813  1.6908683 0.09086195
#> stageT2     -0.54142643 0.44000267 -1.2305071 0.21850725
#> stageT3     -0.05953479 0.45042027 -0.1321761 0.89484501
#> stageT4     -0.23108633 0.44822835 -0.5155549 0.60616530
tbl_regression(m1, exponentiate = TRUE)
Characteristic OR1 95% CI1 p-value
Age 1.02 1.00, 1.04 0.091
T Stage
T1
T2 0.58 0.24, 1.37 0.2
T3 0.94 0.39, 2.28 0.9
T4 0.79 0.33, 1.90 0.6
1 OR = Odds Ratio, CI = Confidence Interval

Note the sensible defaults with this basic usage (that can be customized later):

Customize Output

There are four primary ways to customize the output of the regression model table.

  1. Modify tbl_regression() function input arguments
  2. Add additional data/information to a summary table with add_*() functions
  3. Modify summary table appearance with the {gtsummary} functions
  4. Modify table appearance with {gt} package functions

Modifying function arguments

The tbl_regression() function includes many arguments for modifying the appearance.

Argument Description

label=

modify variable labels in table

exponentiate=

exponentiate model coefficients

include=

names of variables to include in output. Default is all variables

show_single_row=

By default, categorical variables are printed on multiple rows. If a variable is dichotomous and you wish to print the regression coefficient on a single row, include the variable name(s) here.

conf.level=

confidence level of confidence interval

intercept=

indicates whether to include the intercept

estimate_fun=

function to round and format coefficient estimates

pvalue_fun=

function to round and format p-values

tidy_fun=

function to specify/customize tidier function

{gtsummary} functions to add information

The {gtsummary} package has built-in functions for adding to results from tbl_regression(). The following functions add columns and/or information to the regression table.

Function Description

add_global_p()

adds the global p-value for a categorical variables

add_glance_source_note()

adds statistics from `broom::glance()` as source note

add_vif()

adds column of the variance inflation factors (VIF)

add_q()

add a column of q values to control for multiple comparisons

{gtsummary} functions to format table

The {gtsummary} package comes with functions specifically made to modify and format summary tables.

Function Description

modify_header()

update column headers

modify_footnote()

update column footnote

modify_spanning_header()

update spanning headers

modify_caption()

update table caption/title

bold_labels()

bold variable labels

bold_levels()

bold variable levels

italicize_labels()

italicize variable labels

italicize_levels()

italicize variable levels

bold_p()

bold significant p-values

{gt} functions to format table

The {gt} package is packed with many great functions for modifying table output—too many to list here. Review the package’s website for a full listing.

To use the {gt} package functions with {gtsummary} tables, the regression table must first be converted into a {gt} object. To this end, use the as_gt() function after modifications have been completed with {gtsummary} functions.

m1 %>%
  tbl_regression(exponentiate = TRUE) %>%
  as_gt() %>%
  gt::tab_source_note(gt::md("*This data is simulated*"))
Characteristic OR1 95% CI1 p-value
Age 1.02 1.00, 1.04 0.091
T Stage
T1
T2 0.58 0.24, 1.37 0.2
T3 0.94 0.39, 2.28 0.9
T4 0.79 0.33, 1.90 0.6
This data is simulated
1 OR = Odds Ratio, CI = Confidence Interval

Example

There are formatting options available, such as adding bold and italics to text. In the example below,
- Coefficients are exponentiated to give odds ratios
- Global p-values for Stage are reported - Large p-values are rounded to two decimal places
- P-values less than 0.10 are bold - Variable labels are bold
- Variable levels are italicized

# format results into data frame with global p-values
m1 %>%
  tbl_regression(
    exponentiate = TRUE, 
    pvalue_fun = ~style_pvalue(.x, digits = 2),
  ) %>% 
  add_global_p() %>%
  bold_p(t = 0.10) %>%
  bold_labels() %>%
  italicize_levels()
Characteristic OR1 95% CI1 p-value
Age 1.02 1.00, 1.04 0.087
T Stage 0.62
T1
T2 0.58 0.24, 1.37
T3 0.94 0.39, 2.28
T4 0.79 0.33, 1.90
1 OR = Odds Ratio, CI = Confidence Interval

Univariate Regression

The tbl_uvregression() function produces a table of univariate regression models. The function is a wrapper for tbl_regression(), and as a result, accepts nearly identical function arguments. The function’s results can be modified in similar ways to tbl_regression().

trial %>%
  select(response, age, grade) %>%
  tbl_uvregression(
    method = glm,
    y = response,
    method.args = list(family = binomial),
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 2)
  ) %>%
  add_global_p() %>%  # add global p-value 
  add_nevent() %>%    # add number of events of the outcome
  add_q() %>%         # adjusts global p-values for multiple testing
  bold_p() %>%        # bold p-values under a given threshold (default 0.05)
  bold_p(t = 0.10, q = TRUE) %>% # now bold q-values under the threshold of 0.10
  bold_labels()
#> add_q: Adjusting p-values with
#> `stats::p.adjust(x$table_body$p.value, method = "fdr")`
Characteristic N Event N OR1 95% CI1 p-value q-value2
Age 183 58 1.02 1.00, 1.04 0.091 0.18
Grade 193 61 0.93 0.93
I
II 0.95 0.45, 2.00
III 1.10 0.52, 2.29
1 OR = Odds Ratio, CI = Confidence Interval
2 False discovery rate correction for multiple testing

Setting Default Options

The {gtsummary} regression functions and their related functions have sensible defaults for rounding and formatting results. If you, however, would like to change the defaults there are a few options. The default options can be changed using the {gtsummary} themes function set_gtsummary_theme(). The package includes pre-specified themes, and you can also create your own. Themes can control baseline behavior, for example, how p-values are rounded, coefficients are rounded, default headers, confidence levels, etc. For details on creating a theme and setting personal defaults, visit the themes vignette.

Supported Models

Below is a listing of known and tested models supported by tbl_regression(). If a model follows a standard format and has a tidier, it’s likely to be supported as well, even if not listed below.

Model Details

stats::lm()

stats::glm()

stats::aov()

Reference rows are not relevant for such models.

ordinal::clm()

Limited support for models with nominal predictors.

ordinal::clmm()

Limited support for models with nominal predictors.

survival::coxph()

survival::survreg()

survival::clogit()

lme4::lmer()

broom.mixed package required

lme4::glmer()

broom.mixed package required

lme4::glmer.nb()

broom.mixed package required

brms::brm()

broom.mixed package required

geepack::geeglm()

gam::gam()

glmmTMB::glmmTMB()

broom.mixed package required

mgcv::gam()

Use default tidier broom::tidy() for smooth terms only, or gtsummary::tidy_gam() to include parametric terms

nnet::multinom()

survey::svyglm()

survey::svycoxph()

survey::svyolr()

MASS::polr()

MASS::glm.nb()

mice::mira

Limited support. If mod is a mira object, use tidy_plus_plus(mod, tidy_fun = function(x, ...) mice::pool(x) %>% mice::tidy(...))

lavaan::lavaan()

Limited support for categorical variables

stats::nls()

Limited support

lfe::felm()

rstanarm::stan_glm()

broom.mixed package required

VGAM::vglm()

Limited support. It is recommended to use tidy_parameters() as tidy_fun.

cmprsk::crr()

Limited support. It is recommended to use tidycmprsk::crr() instead.

tidycmprsk::crr()

plm::plm()