One of the key features of ubms
is the ability to include random effects in model formulas. This is not possible in unmarked
, but it is possible with custom models using JAGS. To explore how estimates of parameters and uncertainty intervals compared between ubms
and JAGS
, I simulated a basic occupancy dataset with known parameter values and fit models with both. In this simulation, individual sites were nested within groups, and I estimated group-level random intercepts. This is a common scenario for so-called stacked data from multiple years, where the “groups” are unique sites and “sites” represent unique site-year combinations.
First I set the true parameter values for the vector of fixed effects \(\boldsymbol{\beta}\) and the random effects standard deviation \(\sigma\).
Then, I simulated covariate data for the occupancy and detection submodels. This included a random group assignment (26 possible groups) for each site.
dat_occ <- data.frame(x1=rnorm(500), group=factor(sample(letters[1:26], 500, replace=T)))
dat_p <- data.frame(x2=rnorm(500*5))
Next I simulated the random effect value for each group (with an effects parameterization centered on 0).
Finally, I simulated an occupancy dataset y
using the data and parameter values obtained above.
y <- matrix(NA, 500, 5)
z <- rep(NA, 500)
group_idx <- as.numeric(dat_occ$group) #convert group assignment to numeric index
idx <- 1
for (i in 1:500){
#True latent state of site i in group group_idx[i]
z[i] <- rbinom(1,1, plogis(beta[1] + beta[2]*dat_occ$x1[i] + re[group_idx[i]]))
#Observation process
for (j in 1:5){
#Observed data at site i for observation j
y[i,j] <- z[i]*rbinom(1,1, plogis(beta[3] + beta[4]*dat_p$x2[idx]))
idx <- idx + 1
}
}
I combined the covariates and simulated data into an unmarkedFrame
.
Using stan_occu
, I fit a model with a random effect of group on occupancy ((1|group)
). I probably did not use enough iterations but it’s fine for this example.
A bit more work is required to fit this model in JAGS. First I reorganized the data into a list so it could be sent to JAGS.
nsites <- nrow(umf@y)
jags_data <- list(y=umf@y, x1=dat_occ$x1, group=as.numeric(dat_occ$group),
ngroups=nlevels(dat_occ$group),
x2=matrix(dat_p$x2, nrow=nsites, byrow=TRUE),
nsites=nsites, nobs=ncol(umf@y))
Next I wrote a simple occupancy model with group-level intercepts gint
in the BUGS language, and saved it to a temporary file.
modfile <- tempfile()
writeLines("
model{
#Likelihood
for (i in 1:ngroups){
gint[i] ~ dnorm(beta[1], tau.group)
}
for (i in 1:nsites){
z[i] ~ dbern(psi[i])
logit(psi[i]) <- gint[group[i]] + beta[2]*x1[i]
for (j in 1:nobs){
y[i,j] ~ dbern(p[i,j]*z[i])
logit(p[i,j]) <- beta[3] + beta[4]*x2[i,j]
}
}
#Priors
for (i in 1:4){
beta[i] ~ dnorm(0,0.01)
}
sd.group ~ dunif(0,10)
tau.group <- pow(sd.group, -2)
}
", con=modfile)
JAGS also requires a list of parameters you want to save, and a function to generate initial values. Generally you need to provide reasonable initial values for the latent state \(z\). Note that I’m saving both the parameter values ("beta"
and "sd.group"
), along with the actual random effects estimates ("gint"
).
Finally, I fit the model using the R package jagsUI.
library(jagsUI)
jags_fit <- jags(jags_data, inits, params, modfile, n.chains=3, n.adapt=100, n.iter=2000,
n.burnin=1000, n.thin=2, parallel=TRUE, verbose=FALSE)
Running the model in JAGS is significantly slower, although you could speed up by marginalizing the likelihood.
With the model fit in both ubms
and JAGS
, I compared the fixed effect parameter point estimates and uncertainty intervals visually.
#Data frame of JAGS parameter estimates and UIs
jags_sum <- jags_fit$summary
beta_jags <- as.data.frame(jags_sum[1:5,c(1,3,7)])
names(beta_jags) <- c("mean","lower","upper")
beta_jags$source <- "JAGS"
#Data frame of ubms parameter estimates and UIs
ubms_sum <- summary(ubms_fit, "state")
beta_ubms <- rbind(summary(ubms_fit, "state")[,c(1,4,8)],
summary(ubms_fit, "det")[,c(1,4,8)])
beta_ubms <- beta_ubms[c(1:2,4:5,3),]
names(beta_ubms) <- c("mean","lower","upper")
beta_ubms$source <- "ubms"
#Data frame of true parameter values
beta_truth <- data.frame(mean=c(beta, sigma), lower=NA, upper=NA, source="Truth")
#Bind together
beta_plot <- rbind(beta_jags, beta_ubms, beta_truth)
beta_plot$Parameter <- rep(rownames(beta_jags), 3)
#Plot
library(ggplot2)
dodge <- position_dodge(0.4)
ggplot(data=beta_plot, aes(x=Parameter, y=mean, col=source)) +
geom_errorbar(aes(ymin=lower, ymax=upper), position=dodge) +
geom_point(position=dodge, size=3) +
labs(x='Parameter',y='Value', col='Source') +
theme_bw() +
theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(),
axis.text=element_text(size=12), axis.title=element_text(size=14),
legend.text=element_text(size=12), legend.title=element_text(size=14))
Fixed-effects parameter estimates are very similar between ubms
and JAGS
, as are the uncertainty intervals. In both cases the intervals overlap the true parameter values.
Finally, I did the same for the random effect estimates. The ranef
method extracts random effects terms from a fitted ubms
model. Although we specified the random effect in ubms
as an “effects” parameterization, ranef
automatically adds the random effect to the intercept term to get the complete random intercept at the group level. The group-level random intercepts in JAGS are parameter "gint"
.
#Get random effect estimates from ubms
re_ubms <- ranef(ubms_fit, "state", summary=TRUE)[[1]][[1]][,c(1,3,4)]
re_ubms$source <- "ubms"
#Get random effects estimates from JAGS
re_jags <- as.data.frame(jags_sum[grepl("gint", rownames(jags_sum)),c("mean","2.5%","97.5%")])
re_jags$source <- "JAGS"
names(re_jags) <- names(re_ubms)
#Get truth
truth <- data.frame(Estimate=re, `2.5%`=NA, `97.5%`=NA, source="Truth", check.names=FALSE)
#Combine
plot_dat <- rbind(re_ubms, re_jags, truth)
plot_dat$group <- rep(letters[1:26], 3)
levels(plot_dat$source) <- c("Truth","JAGS","ubms")
library(ggplot2)
dodge <- position_dodge(0.4)
#Plot a subset of random effects
plot_dat_sub <- plot_dat[plot_dat$group %in% letters[1:6],]
ggplot(data=plot_dat_sub, aes(x=group, y=Estimate, col=source)) +
geom_errorbar(aes(ymin=`2.5%`, ymax=`97.5%`), position=dodge) +
geom_point(position=dodge, size=3) +
labs(x='Group',y='Random effect value', col='Source') +
theme_bw() +
theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(),
axis.text=element_text(size=12), axis.title=element_text(size=14),
legend.text=element_text(size=12), legend.title=element_text(size=14))
As with the fixed effects, the estimated random intercepts are very similar between the two methods. Success!