Compared to the univariate gridded Gaussian case, we now place the data irregularly and assume we observe counts rather than a Gaussian response.
library(magrittr)
library(dplyr)
library(ggplot2)
library(meshed)
set.seed(2021)
<- 30 # coord values for jth dimension
SS <- 2 # spatial dimension
dd <- SS^2 # number of locations
n <- 3 # number of covariates
p
# irregularly spaced
<- cbind(runif(n), runif(n))
coords colnames(coords) <- c("Var1", "Var2")
<- 1.5
sigmasq <- 10
phi
# covariance at coordinates
<- sigmasq * exp(-phi * as.matrix(dist(coords)))
CC # cholesky of covariance matrix
<- t(chol(CC))
LC # spatial latent effects are a GP
<- LC %*% rnorm(n)
w
# covariates and coefficients
<- matrix(rnorm(n*p), ncol=p)
X <- matrix(rnorm(p), ncol=1)
Beta
# univariate outcome, fully observed
<- rpois(n, exp(-3 + X %*% Beta + w))
y_full
# now introduce some NA values in the outcomes
<- y_full %>% matrix(ncol=1)
y sample(1:n, n/5, replace=FALSE), ] <- NA
y[
<- coords %>%
simdata cbind(data.frame(Outcome_full= y_full,
Outcome_obs = y,
w = w))
%>% ggplot(aes(Var1, Var2, color=w)) +
simdata geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("w: Latent GP")
%>% ggplot(aes(Var1, Var2, color=y)) +
simdata geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: Observed outcomes")
We now fit the following spatial regression model \[ y ~ Poisson(\eta), \] where \(log(\eta) = X %*% Beta + w\), and \(w \sim MGP\) are spatial random effects. For spatial data, an exponential covariance function is used by default: \(C(h) = \sigma^2 \exp( -\phi h )\) where \(h\) is the spatial distance.
The prior for the spatial decay \(\phi\) is \(U[l,u]\) and the values of \(l\) and \(u\) must be specified. We choose \(l=1\), \(u=30\) for this dataset.1
Setting verbose
tells spmeshed
how many intermediate messages to output while running MCMC. For brevity, we opt to run a very short chain of MCMC with only 2000 iterations, of which we discard the first third. Since the data are irregularly spaced, we build a grid of knots of size 1600 using argument grid_size
, which will facilitate computations. Then, just like in the gridded case we use block_size=20
to specify the coarseness of domain partitioning.
Finally, we note that NA
values for the outcome are OK since the full dataset is on a grid. spmeshed
will figure this out and use settings optimized for partly observed lattices, and automatically predict the outcomes at missing locations. On the other hand, X
values are assumed to not be missing.
<- 200 # too small! this is just a vignette.
mcmc_keep <- 400
mcmc_burn <- 2
mcmc_thin
<- system.time({
mesh_total_time <- spmeshed(y, X, coords,
meshout family="poisson",
grid_size=c(20, 20),
block_size = 20,
n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin,
n_threads = 2,
verbose = 0,
prior=list(phi=c(1,30))
)})
We can now do some postprocessing of the results. We extract posterior marginal summaries for \(\sigma^2\), \(\phi\), \(\tau^2\), and \(\beta_2\). The model that spmeshed
targets is a slight reparametrization of the above:2 \[ log(\eta) = X \beta + \lambda w, \] where \(w\sim MGP\) has unitary variance. This model is equivalent to the previous one and in fact we find \(\sigma^2=\lambda^2\). Naturally, it is much more difficult to estimate parameters when data are counts.
summary(meshout$lambda_mcmc[1,1,]^2)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 3.095 6.124 7.722 8.161 10.010 17.385
summary(meshout$theta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.002 1.378 1.739 1.916 2.232 4.130
summary(meshout$beta_mcmc[1,1,])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.57237 -0.40876 -0.34682 -0.33775 -0.26752 -0.05425
We proceed to plot predictions across the domain along with the recovered latent effects. We plot the latent effects at the grid we used for fitting spmeshed
. Instead, we plot our predictions at the original data locations. We may see some pattern by plotting the data on the log scale.
# process means
<- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean())
wmesh # predictions
<- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean())
ymesh
<- meshout$coordsdata %>%
outdf cbind(wmesh, ymesh) %>%
left_join(simdata, by = c("Var1", "Var2"))
%>% filter(forced_grid==1) %>%
outdf ggplot(aes(Var1, Var2, fill=w_mgp)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() + ggtitle("w: recovered")
%>% filter(forced_grid==0) %>%
outdf ggplot(aes(Var1, Var2, color=y_mgp)) +
geom_point() +
scale_color_viridis_c() +
theme_minimal() + ggtitle("y: predictions")
spmeshed
implements a model which can be interpreted as assigning \(\sigma^2\) a folded-t via parameter expansion if settings$ps=TRUE
, or an inverse Gamma with parameters \(a=2\) and \(b=1\) if settings$ps=FALSE
, which cannot be changed at this time. \(\tau^2\) is assigned an exponential prior.↩︎
At its core, spmeshed
implements the spatial factor model \(Y(s) ~ Poisson( exp(X(s)\beta + \Lambda v(s)) )\) where \(w(s) = \Lambda v(s)\) is modeled via linear coregionalization.↩︎