Using pspatreg with spatial panel data

Román Mínguez (UCLM), Roberto Basile (UNIVAQ), María Durbán (UC3M)

2022-06-07

library(pspatreg)
library(spatialreg)
library(spdep)
library(sf)
library(plm)
library(ggplot2)
library(dplyr)
library(splm)

1 Models for spatial panel data

This section focuses on the semiparametric P-Spline model for spatial panel data. The model may include a smooth spatio-temporal trend, a spatial lag of dependent and independent variables, a time lag of the dependent variable and of its spatial lag, and a time series autoregressive noise. Specifically, we consider a spatio-temporal ANOVA model, disaggregating the trend into spatial and temporal main effects, as well as second- and third-order interactions between them.

The empirical illustration is based on data on regional unemployment in Italy. This example shows that this model represents a valid alternative to parametric methods aimed at disentangling strong and weak cross-sectional dependence when both spatial and temporal heterogeneity are smoothly distributed (see Mı́nguez, Basile, and Durbán 2020). The section is organized as follows:

1.1 Dataset, spatial weights matrix and model specifications

The package provides the panel data unemp_it (an object of class data.frame) and the spatial weights matrix Wsp_it (a 103 by 103 square matrix). The raw data - a balanced panel with 103 Italian provinces observed for each year between 1996 and 2019 - can be transformed in a spatial polygonal dataset of class sf after having joined the data.frame object with the shapefile of Italian provinces:

data(unemp_it, package = "pspatreg")
unemp_it_sf <- st_as_sf(dplyr::left_join(unemp_it, map_it, by = c("prov" = "COD_PRO")))

The matrix Wsp_it is a standardized inverse distance W matrix. Using spdep we transform it in a list of neighbors object:

lwsp_it <- spdep::mat2listw(Wsp_it)
summary(lwsp_it)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 103 
## Number of nonzero links: 434 
## Percentage nonzero weights: 4.090866 
## Average number of links: 4.213592 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 
##  7 20 15 16 17 11 10  6  1 
## 7 least connected regions:
## 32 75 78 80 81 90 92 with 1 link
## 1 most connected region:
## 15 with 9 links
## 
## Weights style: M 
## Weights constants summary:
##     n    nn  S0       S1       S2
## M 103 10609 103 74.35526 431.5459

1.2 Linear model (comparison with splm)

Using these data, we first estimate fully parametric spatial linear autoregressive panel models using the function pspatfit() included in the package pspatreg (in the default based on the REML estimator) and compare them with the results obtained using the functions provided by the package splm (based on the ML estimator).

1.2.1 Spatial Lag model (SAR). REML estimates using pspatfit()

We consider here a fixed effects specification, including both fixed spatial and time effects:

\[y_{it}=\rho \sum_{j=1}^N w_{ij,N} y_{jt} + \sum_{k=1}^K \beta_k x_{k,it}+ \alpha_i+\theta_t+\epsilon_{it}\]

\[\epsilon_{it} \sim i.i.d.(0,\sigma^2_\epsilon)\]

formlin <- unrate ~ empgrowth + partrate + agri + cons + serv

Linear_WITHIN_sar_REML <- pspatfit(formula = formlin,
                   data = unemp_it, 
                   listw = lwsp_it, 
                   demean = TRUE,
                   eff_demean = "twoways",
                   type = "sar",
                   index = c("prov", "year"))

summary(Linear_WITHIN_sar_REML)
## 
##  Call 
## pspatfit(formula = formlin, data = unemp_it, listw = lwsp_it, 
##     type = "sar", demean = TRUE, eff_demean = "twoways", index = c("prov", 
##         "year"))
## 
##  Parametric Terms 
##             Estimate Std. Error t value  Pr(>|t|)    
## empgrowth -0.129538   0.014091 -9.1932 < 2.2e-16 ***
## partrate   0.391656   0.023315 16.7981 < 2.2e-16 ***
## agri      -0.036743   0.027267 -1.3475 0.1779308    
## cons      -0.166868   0.044510 -3.7490 0.0001816 ***
## serv       0.012198   0.020597  0.5922 0.5537490    
## rho        0.265671   0.018858 14.0880 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Goodness-of-Fit 
##  
##  EDF Total:      6 
##  Sigma: 1.86934 
##  AIC:  5396.53 
##  BIC:  5431.41
Linear_WITHIN_sar_ML <- spml(formlin,
               data = unemp_it, 
               index=c("prov","year"),
               listw = lwsp_it,
               model="within",
               effect = "twoways",
               spatial.error="none", 
               lag=TRUE, 
               Hess = FALSE)

round(data.frame(Linear_WITHIN_sar_REML = c(Linear_WITHIN_sar_REML$rho, 
                                            Linear_WITHIN_sar_REML$bfixed), 
                Linear_WITHIN_sar_ML = c(Linear_WITHIN_sar_ML$coefficients[1], 
                                         Linear_WITHIN_sar_ML$coefficients[-1])),3)
##                 Linear_WITHIN_sar_REML Linear_WITHIN_sar_ML
## rho                              0.266                0.266
## fixed_empgrowth                 -0.130               -0.130
## fixed_partrate                   0.392                0.392
## fixed_agri                      -0.037               -0.037
## fixed_cons                      -0.167               -0.167
## fixed_serv                       0.012                0.012

Clearly, both methods give exactly the same results, at least at the third digit level.

Extract coefficients:

coef(Linear_WITHIN_sar_REML)
##        rho  empgrowth   partrate       agri       cons       serv 
##  0.2656708 -0.1295384  0.3916562 -0.0367427 -0.1668684  0.0121982

Extract fitted values and residuals:

fits <- fitted(Linear_WITHIN_sar_REML)
resids <- residuals(Linear_WITHIN_sar_REML)

Extract log-likelihood and restricted log-likelihhod:

logLik(Linear_WITHIN_sar_REML)
## 'log Lik.' -2692.266 (df=6)
logLik(Linear_WITHIN_sar_REML, REML = TRUE)
## 'log Lik.' -2711.112 (df=6)

Extract the covariance matrix of estimated coefficients. Argument bayesian allows to get bayesian (default) or frequentist covariances:

vcov(Linear_WITHIN_sar_REML)
##               empgrowth      partrate          agri          cons          serv
## empgrowth  1.985464e-04 -8.106848e-05 -3.050359e-06  3.032827e-05 -6.861998e-06
## partrate  -8.106848e-05  5.436097e-04 -7.233434e-05 -1.649890e-04 -7.891185e-05
## agri      -3.050359e-06 -7.233434e-05  7.434641e-04  1.891487e-04  2.741340e-04
## cons       3.032827e-05 -1.649890e-04  1.891487e-04  1.981149e-03  2.432964e-04
## serv      -6.861998e-06 -7.891185e-05  2.741340e-04  2.432964e-04  4.242353e-04
vcov(Linear_WITHIN_sar_REML, bayesian = FALSE)
##               empgrowth      partrate          agri          cons          serv
## empgrowth  1.985464e-04 -8.106848e-05 -3.050359e-06  3.032827e-05 -6.861998e-06
## partrate  -8.106848e-05  5.436097e-04 -7.233434e-05 -1.649890e-04 -7.891185e-05
## agri      -3.050359e-06 -7.233434e-05  7.434641e-04  1.891487e-04  2.741340e-04
## cons       3.032827e-05 -1.649890e-04  1.891487e-04  1.981149e-03  2.432964e-04
## serv      -6.861998e-06 -7.891185e-05  2.741340e-04  2.432964e-04  4.242353e-04

A print method to get printed coefficients, standard errors and p-values of parametric terms:

print(Linear_WITHIN_sar_REML)
##           Estimate Std. Error t value Pr(>|t|)
## empgrowth  -0.1295     0.0141 -9.1932   0.0000
## partrate    0.3917     0.0233 16.7981   0.0000
## agri       -0.0367     0.0273 -1.3475   0.1779
## cons       -0.1669     0.0445 -3.7490   0.0002
## serv        0.0122     0.0206  0.5922   0.5537
## rho         0.2657     0.0189 14.0880   0.0000
summary(Linear_WITHIN_sar_REML)
## 
##  Call 
## pspatfit(formula = formlin, data = unemp_it, listw = lwsp_it, 
##     type = "sar", demean = TRUE, eff_demean = "twoways", index = c("prov", 
##         "year"))
## 
##  Parametric Terms 
##             Estimate Std. Error t value  Pr(>|t|)    
## empgrowth -0.129538   0.014091 -9.1932 < 2.2e-16 ***
## partrate   0.391656   0.023315 16.7981 < 2.2e-16 ***
## agri      -0.036743   0.027267 -1.3475 0.1779308    
## cons      -0.166868   0.044510 -3.7490 0.0001816 ***
## serv       0.012198   0.020597  0.5922 0.5537490    
## rho        0.265671   0.018858 14.0880 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Goodness-of-Fit 
##  
##  EDF Total:      6 
##  Sigma: 1.86934 
##  AIC:  5396.53 
##  BIC:  5431.41
summary(Linear_WITHIN_sar_ML)
## Spatial panel fixed effects lag model
##  
## 
## Call:
## spml(formula = formlin, data = unemp_it, index = c("prov", "year"), 
##     listw = lwsp_it, model = "within", effect = "twoways", lag = TRUE, 
##     spatial.error = "none", Hess = FALSE)
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -8.045700 -1.068404 -0.035768  1.014227  7.816307 
## 
## Spatial autoregressive coefficient:
##        Estimate Std. Error t-value  Pr(>|t|)    
## lambda 0.266004   0.020636   12.89 < 2.2e-16 ***
## 
## Coefficients:
##            Estimate Std. Error t-value  Pr(>|t|)    
## empgrowth -0.129530   0.014078 -9.2009 < 2.2e-16 ***
## partrate   0.391597   0.023464 16.6894 < 2.2e-16 ***
## agri      -0.036771   0.027219 -1.3510 0.1767102    
## cons      -0.166896   0.044420 -3.7572 0.0001718 ***
## serv       0.012191   0.020559  0.5930 0.5532105    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Computing average direct, indirect and total marginal impacts:

imp_parvar_sar <- impactspar(Linear_WITHIN_sar_REML, listw = lwsp_it)
summary(imp_parvar_sar)
## 
##  Total Parametric Impacts (sar) 
##            Estimate Std. Error   t value Pr(>|t|)
## empgrowth -0.176573   0.019656 -8.983384   0.0000
## partrate   0.534058   0.034073 15.674062   0.0000
## agri      -0.050685   0.038156 -1.328366   0.1841
## cons      -0.230225   0.061790 -3.725926   0.0002
## serv       0.017306   0.028278  0.611988   0.5405
## 
##  Direct Parametric Impacts (sar) 
##            Estimate Std. Error   t value Pr(>|t|)
## empgrowth -0.132763   0.014486 -9.164996   0.0000
## partrate   0.401542   0.023991 16.737464   0.0000
## agri      -0.038115   0.028685 -1.328747   0.1839
## cons      -0.173082   0.046139 -3.751332   0.0002
## serv       0.012996   0.021247  0.611659   0.5408
## 
##  Indirect Parametric Impacts (sar) 
##             Estimate Std. Error    t value Pr(>|t|)
## empgrowth -0.0438106  0.0061748 -7.0950827   0.0000
## partrate   0.1325152  0.0142666  9.2884688   0.0000
## agri      -0.0125704  0.0095534 -1.3158057   0.1882
## cons      -0.0571431  0.0163081 -3.5039727   0.0005
## serv       0.0043098  0.0070594  0.6105034   0.5415

1.2.2 Spatial error within model (SEM). REML estimates using pspatfit():

\[y_{it}= \sum_{k=1}^K \beta_k x_{k,it}+\alpha_i+\theta_t+ \epsilon_{it}\]

\[\epsilon_{it}=\theta \sum_{j=1}^N w_{ij,N}\epsilon_{it}+u_{it}\]

\[u_{it} \sim i.i.d.(0,\sigma^2_u)\]

Linear_WITHIN_sem_REML <- pspatfit(formlin,
                               data = unemp_it, 
                               demean = TRUE,
                               eff_demean = "twoways",
                               listw = lwsp_it, 
                               index = c("prov", "year"),
                               type = "sem")

Linear_WITHIN_sem_ML <- spml(formlin,
                         data = unemp_it, 
                         index=c("prov","year"),
                         listw = lwsp_it,
                         model="within",
                         effect = "twoways",
                         spatial.error="b", 
                         lag=FALSE, 
                         Hess = FALSE)

round(data.frame(Linear_WITHIN_sem_REML = c(Linear_WITHIN_sem_REML$delta, 
                                            Linear_WITHIN_sem_REML$bfixed), 
                 Linear_WITHIN_sem_ML = c(Linear_WITHIN_sem_ML$spat.coef, 
                                          Linear_WITHIN_sem_ML$coefficients[-1])),3)
##                 Linear_WITHIN_sem_REML Linear_WITHIN_sem_ML
## delta                            0.283                0.283
## fixed_empgrowth                 -0.134               -0.134
## fixed_partrate                   0.399                0.399
## fixed_agri                      -0.033               -0.033
## fixed_cons                      -0.188               -0.188
## fixed_serv                       0.031                0.031

References

Mı́nguez, Román, Roberto Basile, and Marı́a Durbán. 2020. “An Alternative Semiparametric Model for Spatial Panel Data.” Statistical Methods & Applications 29 (4): 669–708. https://doi.org/10.1007/s10260-019-00492-8.