In this vignette, we will discuss the use of cluster-level models using the SUMMER package. We will use the simulated surveys in the DemoData
dataset in the package. The DemoData
object is a list that contains full birth history data from simulated surveys with stratified cluster sampling design, similar to most of the DHS surveys. It has been pre-processed into the person-month format, where for each list entry, each row represents one person-month record. Each record contains columns for the cluster ID (clustid
), household ID (id
), strata membership (strata
) and survey weights (weights
). The region and time period associated with each person-month record has also been computed and saved in the dataset. Raw DHS data can be processed into this format using the getBirths
function. More details of the data processing step and person-month format are discussed in the package vignette A Case Study of Estimating Subnational U5MR using Smoothed Direct Methods.
We first load the package and data. We will use the dplyr package in some data processing steps.
We now describe the fitting of the cluster level model for U5MR. For this simulated dataset, there is no stratification within regions, so we treat the region variable as strata. To fit the cluster-level model, we calculate the number of person-months and number of deaths for each cluster, time period, and age group. We first create the data frame using the getCounts
function for each survey and combine them into a single data frame.
The model fitting function smoothCluster
expects columns with specific names. We rename the cluster ID and time period columns to be ‘cluster’ and ‘years’. The response variable is ‘Y’ and the binomial total is ‘total’.
counts.all <- NULL
for (i in 1:length(DemoData)) {
vars <- c("clustid", "region", "time", "age")
counts <- getCounts(DemoData[[i]][, c(vars, "died")], variables = "died", by = vars,
drop = TRUE)
counts <- counts %>% mutate(cluster = clustid, years = time, Y = died)
counts$survey <- names(DemoData)[i]
counts.all <- rbind(counts.all, counts)
}
With the created data frame, we fit the cluster-level model using the smoothCluster
function. Notice that here we need to specify the age groups (age.groups
), the length of each age group (age.n
) in months, and how the age groups are mapped to the temporal random effects (age.rw.group
). In the default case, age.rw.group = c(1, 2, 3, 3, 3, 3)
means the first two age groups each has its own temporal trend, the the following four age groups share the same temporal trend. We start with the default temporal model of random walk or order 2 on the 5-year periods in this dataset (with real data, we can use a finer temporal resolution). We add survey iid effects to the model as well.
periods <- c("85-89", "90-94", "95-99", "00-04", "05-09", "10-14")
fit.bb <- smoothCluster(data = counts.all, Amat = DemoMap$Amat,
family = "betabinomial",
year_label = c(periods, "15-19"),
age.groups = c("0", "1-11", "12-23", "24-35", "36-47", "48-59"),
age.n = c(1,11,12,12,12,12),
age.rw.group = c(1,2,3,3,3,3),
time.model = "rw2", survey.effect = TRUE)
## This is INLA_21.02.23 built 2021-02-22 21:16:12 UTC.
## - See www.r-inla.org/contact-us for how to get help.
## - To enable PARDISO sparse library; see inla.pardiso()
## ----------------------------------
## Cluster-level model
## Main temporal model: rw2
## Spatial effect: bym2
## Interaction temporal model: rw2
## Interaction type: 4
##
## Number of age groups: 6
## Number of age-specific trends per stratum: 3
## Stratification: no, strata variable not in the input
## Survey effect: yes
## ----------------------------------
In this example, we do not have urban/rural strata in the dataset. When further within area stratification exists, we may also choose to model stratum-age specific temporal trends using the strata.time.effect
argument. We also do not have any bias adjustment in this simple model. If the ratio adjustments to U5MR are known, they could be entered with the bias.adj
and bias.adj.by
arguments when fitting the model.
Posterior samples from the model are taken and summarized using the getSmoothed
function. For models with a large number of areas and time points, this step may take some time to compute. The save.draws
argument makes it possible to save the raw posterior draws, so that we can use them again in other functions or recompute different posterior credible intervals.
## Length Class Mode
## overall 13 SUMMERproj list
## stratified 12 SUMMERproj list
## draws 1000 -none- list
## draws.est 28 -none- list
## draws.est.overall 28 -none- list
## msg 1 -none- character
## nsim 1 -none- numeric
For example, to recompute the posterior CI directly using the existing draws:
There are many options to change the model components. For example, we can use a different temporal component in the space-time interaction model. The default interaction model assumes the temporal component to follow the same model as the main temporal trend (so in the example above, random walk or order 2) and interact with a spatial ICAR process. Here we specify a different model where the space-time interaction is assumed to follow an AR(1) interacting with an ICAR process (via the st.time.model
argument). We also add region-specific random slopes in time (via pc.st.slope.u
and pc.st.slope.alpha
; these two parameters specify the PC priors for the random slopes).
fit.bb.ar1 <- smoothCluster(data = counts.all, Amat = DemoMap$Amat,
family = "betabinomial",
year_label = c(periods, "15-19"),
age.groups = c("0", "1-11", "12-23", "24-35", "36-47", "48-59"),
age.n = c(1,11,12,12,12,12),
age.rw.group = c(1,2,3,3,3,3),
time.model = "rw2", survey.effect = TRUE,
st.time.model = "ar1", type.st = 4,
pc.st.slope.u = 1, pc.st.slope.alpha = 0.1)
## ----------------------------------
## Cluster-level model
## Main temporal model: rw2
## Spatial effect: bym2
## Interaction temporal model: ar1
## Interaction type: 4
## Interaction random slopes: yes
## Number of age groups: 6
## Number of age-specific trends per stratum: 3
## Stratification: no, strata variable not in the input
## Survey effect: yes
## ----------------------------------
## ----------------------------------
## Cluster-level model
## Main temporal model:
## Spatial effect: bym2
## Interaction temporal model: ar1
## Interaction type: 4
## Interaction random slopes: yes
## Number of age groups: 6
## Number of age-specific trends per stratum: 3
## Stratification: no, strata variable not in the input
## Survey effect: yes
## ----------------------------------
## Fixed Effects
## mean sd 0.025quant 0.5quant 0.97quant mode kld
## age0 -2.9 0.11 -3.1 -2.9 -2.7 -2.9 0
## age1-11 -4.8 0.10 -5.0 -4.8 -4.6 -4.8 0
## age12-23 -5.8 0.12 -6.0 -5.8 -5.5 -5.8 0
## age24-35 -6.7 0.17 -7.0 -6.7 -6.4 -6.7 0
## age36-47 -7.1 0.20 -7.5 -7.1 -6.7 -7.0 0
## age48-59 -7.5 0.25 -8.0 -7.5 -7.0 -7.5 0
## ----------------------------------
## Random Effects
## Name Model
## 1 time.struct RW2 model
## 2 time.unstruct IID model
## 3 region.struct BYM2 model
## 4 region.int Besags ICAR model
## 5 st.slope.id IID model
## 6 survey.id IID model
## ----------------------------------
## Model hyperparameters
## mean sd 0.025quant
## overdispersion for the betabinomial observations 0.002 0.001 0.001
## Precision for time.struct 179.445 328.369 12.059
## Precision for time.unstruct 977.334 4621.273 14.789
## Precision for region.struct 258.532 709.285 4.937
## Phi for region.struct 0.353 0.255 0.022
## Precision for region.int 631.786 2470.871 10.925
## Group PACF1 for region.int 0.911 0.172 0.382
## Precision for st.slope.id 33.179 103.493 0.923
## 0.5quant 0.97quant mode
## overdispersion for the betabinomial observations 0.002 0.005 0.002
## Precision for time.struct 88.055 829.094 29.832
## Precision for time.unstruct 226.131 5614.636 33.098
## Precision for region.struct 88.401 1398.203 10.281
## Phi for region.struct 0.298 0.880 0.057
## Precision for region.int 168.704 3598.634 24.353
## Group PACF1 for region.int 0.976 1.000 1.000
## Precision for st.slope.id 10.804 180.630 2.177
## [,1]
## Expectected number of parameters 21.9
## Stdev of the number of parameters 2.1
## Number of equivalent replicates 772.9
## [,1]
## log marginal-likelihood (integration) -3425
## log marginal-likelihood (Gaussian) -3425
## Starting posterior sampling...
## Cleaning up results...
## No stratification in the model. Overall and stratified estimates are the same
Since we fit a non-stratified model, the stratified
object and overall
object in the returned list are the same. We can plot the estimates (or a subset of the estimates) using the S3 class plot function. For example, we plot the estimates for the first model with RW(2) interactions.
library(ggplot2)
plot(est.bb$overall, per1000 = TRUE, plot.CI = TRUE) + facet_wrap(~region, ncol = 4) +
ylim(0, 600) + theme(axis.text.x = element_text(angle = 45, hjust = 1))
In comparison, we plot the estimate from the AR(1) interaction model on the same scale. It can be seen that it induces stronger shrinkage for the region-specific slopes towards the shared trend.
We show the estimates on the map using the mapPlot
function. We use the DemoMap
from the package containing geographic data from the 1995 Uganda Admin 1 regions (but be aware the data are simulated and not correspond to Uganda child mortality).
data(DemoMap)
mapPlot(est.bb$overall,
geo = DemoMap$geo, by.data = "region", by.geo = "REGNAME",
is.long = TRUE, variables = "years", values = "median",
ncol = 4, direction = -1, per1000 = TRUE, legend.label = "U5MR")
We can add hatching to indicate uncertainty of the estimates using hatchPlot
with similar syntax.
hatchPlot(est.bb$overall,
geo = DemoMap$geo, by.data = "region", by.geo = "REGNAME",
is.long = TRUE, variables = "years", values = "median",
lower = "lower", upper = "upper", hatch = "red",
ncol = 4, direction = -1, per1000 = TRUE, legend.label = "U5MR")
Next we show the posterior densities of the estimates, where each each panel correspond to one time period, and the regions are ordered by the posterior median of estimates in the last year (specified by the order = -1
argument).
ridgePlot(draws = est.bb, year_plot = c(periods, "15-19"),
ncol = 4, per1000 = TRUE, order = -1, direction = -1)
It can also be plotted for each region over time as well.