This vignette assumes the reader is already familiar with using nimble to perform Markov chain Monte Carlo (MCMC).
The principal motivation of this package is that, when target posterior distributions are multimodal standard MCMC algorithms often perform badly. In these situations, standard algorithms often propose few, if any, jumps between modes, and can thus provide very poor approximations to target posterior distributions. The nimbleAPT package permits nimble users to perform adaptive parallel tempering (APT) as a potential solution for sampling multimodal posterior distributions.
Parallel tempering is an MCMC technique where the posterior likelihood (of a Bayesian model) is ‘heated’, ‘or tempered’ to various degrees. As temperature is increased, likelihoods become flatter and this can increase the probability and frequency of between-mode jumps. In practice, a “temperature ladder” (a ranked set of temperatures) is established, MCMC is performed within each rung of the temperature ladder, and a special between-rung MCMC step is introduced so that parameter sets can move up and down the temperature ladder. Bayesian inference is made on the posterior samples of the unheated rung only. Adaptive parallel tempering algorithms automatically tune the temperatures of the temperature ladder so that target acceptance rates for proposed between-rung jumps are achieved.
The nimbleAPT package provides the following…
The following toy problem illustrates how classic MCMC algorithms can struggle to approximate multimodal posteriors. The true posterior distribution for centroids should have four modes, but as we will see, classic samplers can struggle to explore such a posterior distribution.
First, we create a nimble model with a multimodal posterior. For this model there is no information in the data to remove uncertainty about the sign of the elements of centroids.
library(nimbleAPT) # Also loads nimble
#> Loading required package: nimble
#> nimble version 0.12.1 is loaded.
#> For more information on NIMBLE and a User Manual,
#> please visit https://R-nimble.org.
#>
#> Attaching package: 'nimble'
#> The following object is masked from 'package:stats':
#>
#> simulate
bugsCode <- nimbleCode({
for (ii in 1:nObs) {
y[ii,1:2] ~ dmnorm(mean=absCentroids[1:2], cholesky=cholCov[1:2,1:2], prec_param=0)
}
absCentroids[1:2] <- abs(centroids[1:2])
for (ii in 1:2) {
centroids[ii] ~ dnorm(0, sd=1E3)
}
})
nObs <- 100
centroids <- rep(-3, 2)
covChol <- chol(diag(2))
rModel <- nimbleModel(bugsCode,
constants=list(nObs=nObs, cholCov=covChol),
inits=list(centroids=centroids))
#> Defining model
#> Building model
#> Setting data and initial values
#> Running calculate on model
#> [Note] Any error reports that follow may simply reflect missing values in model variables.
#> Checking model sizes and dimensions
#> [Note] This model is not fully initialized. This is not an error.
#> To see which variables are not initialized, use model$initializeInfo().
#> For more information on model initialization, see help(modelInitialization).
Now use the model to simulate some data and initialise/specify the model’s data nodes.
simulate(rModel, "y") ## Use model to simulate data
rModel <- nimbleModel(bugsCode,
constants=list(nObs=nObs, cholCov=covChol),
data=list(y=rModel$y),
inits=list(centroids=centroids))
#> Defining model
#> Building model
#> Setting data and initial values
#> Running calculate on model
#> [Note] Any error reports that follow may simply reflect missing values in model variables.
#> Checking model sizes and dimensions
cModel <- compileNimble(rModel)
#> Compiling
#> [Note] This may take a minute.
#> [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
plot(cModel$y, typ="p", xlab="", ylab="", xlim=c(-1,1)*max(cModel$y), ylim=c(-1,1)*max(cModel$y), pch=4)
points(x=cModel$centroids[1], y=cModel$centroids[1], col="red", pch=4, cex=3)
points(x=-cModel$centroids[1], y=cModel$centroids[1], col="red", pch=4, cex=3)
points(x=cModel$centroids[1], y=-cModel$centroids[1], col="red", pch=4, cex=3)
points(x=-cModel$centroids[1], y=-cModel$centroids[1], col="red", pch=4, cex=3)
legend("topleft", c("posterior modes", "data"), col=c("red","black"), pch=4, cex=2)
Now for some standard MCMC using nimble’s default choice of samplers.
simulate(cModel, "centroids")
mcmcR <- buildMCMC(configureMCMC(cModel, nodes="centroids", monitors="centroids"), print=TRUE)
#> ===== Monitors =====
#> thin = 1: centroids
#> ===== Samplers =====
#> RW sampler (2)
#> - centroids[] (2 elements)
mcmcC <- compileNimble(mcmcR)
#> Compiling
#> [Note] This may take a minute.
#> [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
mcmcC$run(niter=15000)
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> NULL
samples <- tail(as.matrix(mcmcC$mvSamples), 10000)
summary(samples)
#> centroids[1] centroids[2]
#> Min. :-3.320 Min. :-3.245
#> 1st Qu.:-3.056 1st Qu.:-2.919
#> Median :-2.983 Median :-2.851
#> Mean :-2.985 Mean :-2.851
#> 3rd Qu.:-2.916 3rd Qu.:-2.783
#> Max. :-2.529 Max. :-2.530
plot(samples, xlab="", ylab="", typ="l", xlim=c(-1,1)*max(cModel$y), ylim=c(-1,1)*max(cModel$y))
points(x=cModel$centroids[1], y=cModel$centroids[1], col="red", pch=4, cex=3)
points(x=-cModel$centroids[1], y=cModel$centroids[1], col="red", pch=4, cex=3)
points(x=cModel$centroids[1], y=-cModel$centroids[1], col="red", pch=4, cex=3)
points(x=-cModel$centroids[1], y=-cModel$centroids[1], col="red", pch=4, cex=3)
legend("topleft", legend=c("posterior modes","jumps"), col=c("red","black"), pch=c("X","_"), bg="white")
As we can see, the default MCMC scheme makes few, if any, jumps between the four potential modes.
So let’s try with adaptive parallel tempering (APT).
conf <- configureMCMC(cModel, nodes="centroids", monitors="centroids", enableWAIC = TRUE)
#> ===== Monitors =====
#> thin = 1: centroids
#> ===== Samplers =====
#> RW sampler (2)
#> - centroids[] (2 elements)
conf$removeSamplers()
conf$addSampler("centroids[1]", type="sampler_RW_tempered", control=list(temperPriors=TRUE))
conf$addSampler("centroids[2]", type="sampler_RW_tempered", control=list(temperPriors=TRUE))
conf
#> ===== Monitors =====
#> thin = 1: centroids
#> ===== Samplers =====
#> RW_tempered sampler (2)
#> - centroids[] (2 elements)
aptR <- buildAPT(conf, Temps=1:5, ULT= 1000, print=TRUE)
#> Initial temperatures: 1 2 3 4 5
#> Monitored nodes are valid for WAIC
aptC <- compileNimble(aptR)
#> Compiling
#> [Note] This may take a minute.
#> [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
aptC$run(niter=15000)
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> NULL
samples <- tail(as.matrix(aptC$mvSamples), 10000)
summary(samples)
#> centroids[1] centroids[2]
#> Min. :-3.3436 Min. :-3.22573
#> 1st Qu.:-2.9542 1st Qu.:-2.85378
#> Median : 2.8919 Median : 2.66572
#> Mean : 0.5712 Mean : 0.09863
#> 3rd Qu.: 3.0110 3rd Qu.: 2.85551
#> Max. : 3.3882 Max. : 3.19312
plot(samples, xlab="", ylab="", typ="l", xlim=c(-1,1)*max(cModel$y), ylim=c(-1,1)*max(cModel$y))
points(samples, col="red", pch=19, cex=0.1)
legend("topleft", legend=c("jumps", "samples"), col=c("black","red"), pch=c("_","X"), bg="white")
We can see that jumps between nodes (black lines) are frequent and that each node has been sampled (red points). The precision with which the weight of each posterior mode is estimated can be increased by increasing 1. the number of rungs in the temperature ladder, and moreover 2. the number of iterations (niter).
Finally, note that WAIC can be computed in exactly the same way as in nimble. See the section ‘calculating WAIC’ in the nimble manual for further details.