SHAP (SHapley Additive exPlanations, see Lundberg and Lee (2017)) is an ingenious way to study black box models. SHAP values decompose - as fair as possible - predictions into additive feature contributions. Crunching SHAP values requires clever algorithms by clever people. Analyzing them, however, is super easy with the right visualizations. The “shapviz” package offers the latter:
sv_dependence()
: Dependence plots to study feature
effects (optionally colored by heuristically strongest interacting
feature).sv_importance()
: Importance plots (bar plots and/or
beeswarm plots) to study variable importance.sv_waterfall()
: Waterfall plots to study single
predictions.sv_force()
: Force plots as an alternative to waterfall
plots.These plots require a “shapviz” object, which is built from two things only:
S
: Matrix of SHAP valuesX
: Dataset with corresponding feature valuesFurthermore, a baseline
can be passed to represent an
average prediction on the scale of the SHAP values.
A key feature of “shapviz” is that X
is used for
visualization only. Thus it is perfectly fine to use factor variables,
even if the underlying model would not accept these. Additionally, in
order to improve visualization, it can sometimes make sense to clip
gross outliers, take logarithms for certain columns, or replace missing
values by some explicit value.
To further simplify the use of “shapviz”, we added direct connectors to the R packages
CatBoost
is
not included, but see Section “Any other package” how to use its SHAP
calculation backend with “shapviz”.
# From CRAN
install.packages("shapviz")
# Or the newest version from GitHub:
# install.packages("devtools")
::install_github("mayer79/shapviz") devtools
We start by fitting an XGBoost model to predict diamond prices based on the four “C” features.
library(shapviz)
library(ggplot2)
library(xgboost)
set.seed(3653)
<- c("clarity", "cut", "color")
ord <- lapply(diamonds[, ord], factor, ordered = FALSE)
diamonds[, ord]
<- diamonds[c("carat", "cut", "color", "clarity")]
X <- xgb.DMatrix(data.matrix(X), label = diamonds$price)
dtrain
<- xgb.train(
fit params = list(learning_rate = 0.1, objective = "reg:squarederror"),
data = dtrain,
nrounds = 65L
)
One line of code creates a “shapviz” object. It contains SHAP values
and feature values for the set of observations we are interested in.
Note again that X
is solely used as explanation dataset,
not for calculating SHAP values.
In this example we construct the “shapviz” object directly from the
fitted XGBoost model. Thus we also need to pass a corresponding
prediction dataset X_pred
used for calculating SHAP values
by XGBoost.
<- X[sample(nrow(X), 2000L), ]
X_small
# X is the "explanation" dataset using the original factors
<- shapviz(fit, X_pred = data.matrix(X_small), X = X_small) shp
Note: If X_pred
would contain one-hot-encoded dummy
variables, their SHAP values could be collapsed by the
collapse
argument of shapviz()
.
The main idea behind SHAP values is to decompose, in a fair way, a prediction into additive contributions of each feature. Typical visualizations include waterfall plots and force plots:
sv_waterfall(shp, row_id = 1L) +
theme(axis.text = element_text(size = 11))
Works pretty sweet, and factor input is respected!
Alternatively, we can study a force plot:
sv_force(shp, row_id = 1L)
Studying SHAP decompositions of many observations allows to gain an impression on variable importance. As simple descriptive measure, the mean absolute SHAP value of each feature is considered. These values can be plotted as a simple bar plot, or, to add information on the sign of the feature effects, as a beeswarm plot sorted by the mean absolute SHAP values. Such beeswarm plots are often called “summary plots”.
# A bar plot of mean absolute SHAP values
sv_importance(shp)
# A beeswarm plot
sv_importance(shp, kind = "beeswarm")
# Or both!
sv_importance(shp, kind = "both", show_numbers = TRUE, width = 0.2)
A SHAP beeswarm importance plot gives first hints on whether high feature values tend to high or low predictions. This impression can be substantiated by studying simple scatterplots of SHAP values of a feature against its feature values. A second feature can be added as color information to see whether the feature effect depends on the feature on the color scale or not. The stronger the vertical scatter for similar values on the x axis, the stronger the interactions.
sv_dependence(shp, v = "color", color_var = "auto")
sv_dependence(shp, v = "carat", color_var = "auto", alpha = 0.2, size = 1) +
guides(colour = guide_legend(override.aes = list(alpha = 1, size = 2)))
The above example uses XGBoost to calculate SHAP values. In the following sections, we show (without running the code), how other packages work together with “shapviz”.
library(lightgbm)
<- lgb.Dataset(data.matrix(X), label = diamonds$price)
dtrain
<- lgb.train(
fit params = list(learning_rate = 0.1, objective = "regression"),
data = dtrain,
nrounds = 65L
)
<- shapviz(fit, X_pred = data.matrix(X_small), X = X_small)
shp sv_importance(shp)
library(fastshap)
<- lm(price ~ carat + clarity + cut + color, data = diamonds)
fit <- explain(fit, newdata = X_small, exact = TRUE)
explainer <- shapviz(explainer, X = X_small)
shp sv_dependence(shp, "carat")
library(shapr)
<- lm(price ~ carat + clarity + cut + color, data = diamonds)
fit <- X[sample(nrow(X), 200L), ]
X_smaller
<- shapr(X_smaller, fit,)
explainer
# Takes about 15 seconds
<- explain(
explanation
X_smaller,approach = "ctree",
explainer = explainer,
prediction_zero = mean(diamonds$price)
)
<- shapviz(explanation)
shp sv_dependence(shp, "carat", "auto")
If you work with a tree-based H2O model:
library(h2o)
h2o.init()
<- as.h2o(diamonds)
dia_h2o
<- h2o.gbm(c("carat", "clarity", "color", "cut"), "price", training_frame = dia_h2o)
fit <- shapviz(fit, X_pred = X_small)
x sv_force(x, row_id = 1)
sv_dependence(x, "carat", "auto")
library(treeshap)
library(ranger)
<- ranger(y = diamonds$price, x = data.matrix(X), max.depth = 6, num.trees = 100)
fit <- ranger.unify(fit, data.matrix(X))
unified_model <- treeshap(unified_model, data.matrix(X_small))
shaps <- shapviz(shaps, X = X_small)
shp sv_importance(shp)
sv_dependence(shp, "clarity", color_var = "auto", alpha = 0.2, size = 1)
library(kernelshap)
<- X[sample(nrow(X), 200L), ]
X_smaller <- lm(price ~ carat + clarity + cut + color, data = diamonds)
fit
# Takes about 30 seconds
system.time(
<- kernelshap(X_smaller, function(X) predict(fit, X), X_smaller)
ks
)
<- shapviz(ks)
shp sv_importance(shp)
sv_dependence(shp, "carat")
The most general interface is to provide a matrix of SHAP values and corresponding feature values (and optionally, a baseline value):
<- matrix(c(1, -1, -1, 1), ncol = 2, dimnames = list(NULL, c("x", "y")))
S <- data.frame(x = c("a", "b"), y = c(100, 10))
X <- shapviz(S, X, baseline = 4) shp
An example is CatBoost: it is not on CRAN and requires
catboost.*()
functions to calculate SHAP values, so we
cannot directly add it to “shapviz” for now. Just use a wrapper like
this:
library(catboost)
<- function(object, X_pred, X = X_pred, collapse = NULL, ...) {
shapviz.catboost.Model if (!requireNamespace("catboost", quietly = TRUE)) {
stop("Package 'catboost' not installed")
}stopifnot(
"X must be a matrix or data.frame. It can't be an object of class catboost.Pool" =
is.matrix(X) || is.data.frame(X),
"X_pred must be a matrix, a data.frame, or a catboost.Pool" =
is.matrix(X_pred) || is.data.frame(X_pred) || inherits(X_pred, "catboost.Pool"),
"X_pred must have column names" = !is.null(colnames(X_pred))
)
if (!inherits(X_pred, "catboost.Pool")) {
<- catboost.load_pool(X_pred)
X_pred
}
<- catboost.get_feature_importance(object, X_pred, type = "ShapValues", ...)
S
# Call matrix method
<- ncol(X_pred) + 1L
pp <- S[1L, pp]
baseline <- S[, -pp, drop = FALSE]
S colnames(S) <- colnames(X_pred)
shapviz(S, X = X, baseline = baseline, collapse = collapse)
}
# Example
<- catboost.load_pool(diamonds[colnames(X)], label = diamonds$price)
X_pool
<- catboost.train(
fit
X_pool, params = list(
loss_function = "RMSE",
iterations = 165,
logging_level = "Silent",
allow_writing_files = FALSE
)
)
<- shapviz(fit, X_small)
shp sv_importance(shp)
sv_dependence(shp, "clarity", color_var = "auto", alpha = 0.2, size = 1)
The plot functions work with one-dimensional model predictions only.
However, the wrappers for XGBoost and LightGBM allow to select the
category of interest and work with its predicted (logit) probabilities,
simply by providing which_class
in the constructor.