prettyglm-vignette

Introduction to prettyglm

When working with Generalized Linear Models it is often useful to create informative and beautiful summaries of the fitted model coefficients.This document introduces prettyglms’s main sets of functions, and shows you how to apply them.

Data: titanic

To explore the functionality of prettyglm we will use the titanic data set to perform logistic regression. This data set was sourced from https://www.kaggle.com/c/titanic/data and contains information about passengers aboard the titanic, and a target variable which indicates if they survived.

data('titanic')
head(titanic) %>%
  knitr::kable(table.attr = "style='width:30%;'" ) %>% 
  kableExtra::kable_classic(full_width = T, position = "center")
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Cabintype
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 Missing S Missing
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 Missing S Missing
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S C
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 Missing S Missing
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 Missing Q Missing

Pre-processing

A critical step for this package to work well is to set all categorical predictors as factors.

# Easiest way to convert multiple columns to a factor.
columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin', 
                       'Embarked',
                       'Cabintype')
titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.)))

Building a glm

For this vignette we will use stats::glm() but prettyglm also supports parsnip and workflow model objects which use the glm model engine.

survival_model <- stats::glm(Survived ~ Pclass + Sex + Age + Fare + Embarked + SibSp + Parch + Cabintype, 
                             data = titanic, 
                             family = binomial(link = 'logit'))

Create pretty table of model coefficients with pretty_coefficients()

pretty_coefficients() allows you to create a pretty table of model coefficients, including categorical base levels.

pretty_coefficients(survival_model)
Variable Level Importance Estimate Std.error P.Value
(Intercept) (Intercept) 4.9902304 0.8148416 0
Pclass 1 0.0000000 0.0000000 NA
2 -0.7001117 0.5071900 0.16747
3 -1.8257623 0.5207283 0.00045
Sex female 0.0000000 0.0000000 NA
male -2.6858619 0.2280753 0
Age Age -0.0437710 0.0085211 0
Fare Fare 0.0029054 0.0029893 0.33108
Embarked C 0.0000000 0.0000000 NA
Q -0.7941971 0.6048757 0.18919
S -0.4336321 0.2836360 0.12631
SibSp SibSp -0.3546154 0.1305480 0.0066
Parch Parch -0.0698581 0.1250851 0.57651
Cabintype A 0.0000000 0.0000000 NA
B -0.5672106 0.8047623 0.48092
C -1.1929795 0.7656414 0.1192
D -0.1838547 0.8120140 0.82088
E 0.4033056 0.8153736 0.62086
F 0.1140862 1.1176106 0.91869
G -1.8615347 1.3320706 0.16227
Missing -1.1033884 0.7893559 0.16216
T -13.5702506 535.4115815 0.97978
Goodness-of-Fit: AIC: 653.6 , Devience : 617.6 , Null Devience: 960.9

You can also complete a type III test on the coefficients by specifying a type_iii argument.

pretty_coefficients(survival_model, type_iii = 'Wald')
Variable Level Importance Estimate Std.error P.Value Type.III.P.Value
(Intercept) (Intercept) 4.9902304 0.8148416 0 0
Pclass 1 0.0000000 0.0000000 NA 0
2 -0.7001117 0.5071900 0.16747
3 -1.8257623 0.5207283 0.00045
Sex female 0.0000000 0.0000000 NA 0
male -2.6858619 0.2280753 0
Age Age -0.0437710 0.0085211 0 0
Fare Fare 0.0029054 0.0029893 0.33108 0.33108
Embarked C 0.0000000 0.0000000 NA 0.22964
Q -0.7941971 0.6048757 0.18919
S -0.4336321 0.2836360 0.12631
SibSp SibSp -0.3546154 0.1305480 0.0066 0.0066
Parch Parch -0.0698581 0.1250851 0.57651 0.57651
Cabintype A 0.0000000 0.0000000 NA 0.10512
B -0.5672106 0.8047623 0.48092
C -1.1929795 0.7656414 0.1192
D -0.1838547 0.8120140 0.82088
E 0.4033056 0.8153736 0.62086
F 0.1140862 1.1176106 0.91869
G -1.8615347 1.3320706 0.16227
Missing -1.1033884 0.7893559 0.16216
T -13.5702506 535.4115815 0.97978
Goodness-of-Fit: AIC: 653.6 , Devience : 617.6 , Null Devience: 960.9

or return a data frame by setting return_data to TRUE.

Create beautiful plots of variable relativities with pretty_relativities()

pretty_relativities() creates an interactive duel axis plot (via plotly), which plots the fitted relativity and the number of records in that category.

pretty_relativities(feature_to_plot= 'Embarked',
                    model_object = survival_model)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.