psar: Estimate spatial or spatio-temporal semiparametric PS-SAR...
In rominsal/sptpsar: Spatio-Temporal Semiparametric Models with Spatial Lags

View source: R/psar.R

psar	R Documentation

Estimate spatial or spatio-temporal semiparametric PS-SAR regression models.

Description

Estimate geoadditive spatial or spatio-temporal semiparametric PS-SAR regression models including parametric and non-parametric covariates, spatial or spatio-temporal non-parametric trends and spatial lags. The non-parametric terms (either trends or covariates) are modelled using P-Splines. The non-parametric trend can be decomposed in an ANOVA way including main and interactions effects of 2nd and 3rd order. For the spatial lag a SAR model is specified. The estimation method is Restricted Maximum Likelihood (REML).

Usage

psar(formula, data, Wsp = NULL, sar = FALSE, ar1 = FALSE,
  control = list(vary_init = NULL, thr = 0.001, maxit = 200, trace =
  FALSE, rho_init = 0, phi_init = 0, rho_fixed = FALSE, phi_fixed = FALSE),
  ...)

Arguments

`formula`	A formula similar to GAM specification including parametric and non-parametric terms. Parametric covariates are included in the usual way and non-parametric p-spline smooth terms are specified using `pspl(.)` and `pspt(.)` for the non-parametric covariates and spatial or spatio-temporal trend, respectively. More details in Details and Examples.
`data`	A data frame containing the parametric and non-parametric covariates included in the model. Also, if a `pspt(.)` term is included in formula, the data frame must include the spatial and temporal coordinates specified in `pspt(.)`. In this case coordinates must be ordered choosing time as fast index and spatial coordinates as low indexes. See `head(unemp_it)` as an example.
`Wsp`	A neighbour spatial matrix. The dimension of the matrix is always NxN, where N is the dimension of each spatial coordinate. Default NULL.
`sar`	A logical value used to include a spatial lag of the dependent variable in the model, that is, to estimate PS-SAR models. Default TRUE.
`ar1`	A logical value indicating if the noise of spatio-temporal model follows a first order autoregressive process. Default FALSE.
`control`	List of extra control arguments - see section below

Details

Function to estimate the model:

y = (ρ*W_N \otimes I_T) y + X β + f(s_1,s_2,τ_t) + ∑_{i=1}^k g(z_i) + ε

where:

f(s_1,s_2,τ_t) is a smooth spatio-temporal trend of the spatial coordinates s1,s_2 and of the temporal coordinate τ_t.
X is a matrix including values of parametric covariates.
g(z_i) are non-parametric smooth functions of the covariates z_i.
W_N is the spatial weights matrix.
ρ is the spatial spillover parameter.
I_T is an identity matrix of order T (T=1 for pure spatial data).
ε ~ N(0,R) where R = σ^2 I_T if errors are uncorrelated or it follows an AR(1) temporal autoregressive structure for serially correlated errors.

The non-parametric terms are included in formula using pspt(.) for spatial or spatio-temporal trends and pspl(.) for other non-parametric smooth additive terms.

For example, if a model includes a spatio-temporal trend with long, and lat as spatial coordinates and year as temporal coordinate; two non-parametric covariates (empgrowth and serv) and three parametric covariates (partrate, agri and emphcons), the formula (choosing default values for each term) should be written as:

unrate ~ partrate + agri + cons + pspl(serv) + pspl(empgrowth) + pspt(long,lat,year)

For spatial trend case the term pspt(.) does not include a temporal coordinate, that is, in the previous example would be specified as pspt(long,lat).

In many situations the spatio-temporal trend given by f(s_1,s_2,τ_t) can be very complex and the use of a multidimensional smooth function may not be flexible enough to capture the structure in the data. Furthermore, the estimation of this trend can become computationally intensive especially for large databases. To solve this problem, Lee and Durbán (2011) proposed an ANOVA-type decomposition of this spatio-temporal trend where spatial and temporal main effects, and second- and third-order interaction effects can be identified as:

f(s_1,s_2,τ_t) = f_1(s_1) + f_2(s_2) + f_t(τ_t) + f_{1,2}(s_1,s_2) + f_{1,t}(s_1,τ_t) + f_{2,t}(s_2,τ_t) + f_{1,2,t}(s_1,s_2,τ_t)

In this equation f_1(s_1), f_2(s_2) and f_t(τ_t) are the main effects, f_{1,2}(s_1,s_2), f_{1,t}(s_1,τ_t) and f_{2,t}(s_2,τ_t) are the second-order interaction effects and f_{1,2,t}(s_1,s_2,τ_t) is the third-order interaction effect. In this case, each effect can have its own degree of smoothing allowing a greater flexibility for the spatio-temporal trend. The ANOVA decomposition of the trend can be set as an argument in pspl(.) terms choosing psanova=TRUE. For example to choose an ANOVA decomposition in previous case we can set pspt(long,lat,year,nknots=c(18,18,8),psanova=TRUE,ntime=19). Note that for spatio-temporal data it is needed to specify the number of temporal periods in ntime but the number of spatial coordinates are not needed. See Examples for more details.

In most empirical cases main effects are more flexible than interaction effects and therefore, the number of knots in B-Spline basis for interaction effects do not need to be as big as the number of knots for main effects. Lee et al., (2013) proposed a nested basis procedure in which the number of knots for the interaction effects are reduced using divisors such that the space spanned by B-spline bases used for interaction effects are a subset of the space spanned by B-spline bases used for main effects. The divisors can be specified as an argument in pspt(.) terms. See Examples for details.

All the terms are jointly fitted using Separation of Anisotropic Penalties (SAP) algorithm (see Rodríguez-Álvarez et al., (2015)) which allows to the mixed model reparameterization of the model. When sar or ar1 arguments are TRUE, ρ and φ parameters are numerically estimated using function nloptr implemented in pakage nloptr . In these cases, an iterative process between SAP and numerical optimization of ρ and φ is applied until convergence. See details in Mínguez et al., (2018).

Value

A list object of class psar

`call`	Call of the function.
`terms`	The terms object used.
`contrasts`	(only where relevant) the contrasts used for parametric covariates.
`xlevels`	(only where relevant) a record of the levels of the parametric factors used in fitting.
`edftot`	Equivalent degrees of freedom for the whole model.
`edfspt`	Equivalent degrees of freedom for smooth spatio-temporal or spatial trend.
`edfnopar`	Equivalent degrees of freedom for non-parametric covariates.
`psanova`	TRUE if spatio-temporal or spatial trend is PS-ANOVA.
`bfixed`	Estimated betas corresponding to fixed effects in mixed model. These betas comes from either parametric covariates or fixed coefficients of smooth terms reparameterized as mixed models.
`se_bfixed`	Standard errors of fixed betas.
`brandom`	Estimated betas corresponding to random effects in mixed model. These betas comes from random coefficients of smooth terms reparameterized as mixed models.
`se_brandom`	Standard errors of random betas.
`vcov_b`	Covariance matrix of fixed and random effects.
`sar`	TRUE if model is PS-SAR.
`rho`	Estimated rho. If `sar=FALSE` always rho=0.

`se_rho`	Standard error of rho.
`ar1`	TRUE if model has a temporal autoregressive of first order in residuals.
`phi`	Estimated phi. If `ar1=FALSE` always phi=0.
`se_rho`	Standard error of phi.
`fitted.values`	Vector of fitted values of the dependent variable.
`se_fitted.values`	Vector of standard errors of `fitted.values`.
`fitted.values_Ay`	Vector of fitted values of the spatial lag of dependent variable: (ρW_N \otimes I_T) y*.
`se_fitted.values_Ay`	Vector of standard errors of `fitted.values_Ay`.
`residuals`	Vector of residuals.
`residuals_norm`	Uncorrelated residuals. For non-temporal data they are the same than `residuals`.
`df.residual`	Equivalent degrees of freedom for `residuals`.
`sig2`	Residual variance computed as SSR/df.residual.
`llik`	Log-likelihood value.
`llik_reml`	Restricted log-likelihood value.
`aic`	Akaike information criterion.
`bic`	Bayesian information criterion.
`sp1`	First spatial coordinate.
`sp2`	Second spatial coordinate.
`time`	Time coordinate.
`y`	Dependent variable.
`X`	Model matrix for fixed effects.
`Z`	Model matrix for random effects.

Control arguments

`vary_init`	Initial value of the noise variance in the model. Default NULL.
`trace`	A logical value set to TRUE to provide information about the convergence of the estimation process. Default FALSE.
`thr`	Numerical value for the threshold of convergence in the estimation process. Default 1e-3.
`maxit`	An integer value for the limit of the number of iterations during estimation process before reach convergence. Default 200.
`rho_init`	An initial value for rho parameter if `sar=TRUE`. Default 0.
`phi_init`	An initial value for phi parameter if `ar1=TRUE`. Default 0.
`rho_fixed`	A logical value to set fixed rho parameter during estimation process. Default FALSE.
`phi_fixed`	A logical value to set fixed phi parameter during estimation process. Default FALSE.

Author(s)

Roman Minguez roman.minguez@uclm.es

References

Basile, R., Durbán, M., Mínguez, R., Montero, J. M., and Mur, J. (2014). Modeling regional economic dynamics: Spatial dependence, spatial heterogeneity and nonlinearities. Journal of Economic Dynamics and Control, (48), 229-245.
Eilers, P. and Marx, B. (1996). Flexible Smoothing with B-Splines and Penalties. Statistical Science, (11), 89-121.
Lee, D. and Durbán, M. (2011). P-Spline ANOVA Type Interaction Models for Spatio-Temporal Smoothing. Statistical Modelling, (11), 49-69.
Lee, D. J., Durban, M., and Eilers, P. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics & Data Analysis, (61), 22-37.
LeSage, J. and Pace, K. (2009). Introduction to Spatial Econometrics. CRC Press, Boca Raton.
Mínguez, R.; Basile, R. and Durbán, M. (2018). An Alternative Semiparametric Model for Spatial Panel Data. Under evaluation in Statistical Methods and Applications.
Montero, J., Mínguez, R., and Durbán, M. (2012). SAR models with nonparametric spatial trends: A P-Spline approach. Estadística Española, (54:177), 89-111.
Rodríguez-Alvarez, M. X.; Kneib, T.; Durban, M.; Lee, D.J. and Eilers, P. (2015). Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Statistics and Computing 25 (5), 941-957.

Examples

################################################
 ###################### Examples using a panel data of rate of
 ###################### unemployment for 103 Italian provinces in period 1996-2014.
library(sptpsar)
data(unemp_it); Wsp <- Wsp_it
 ######################  GAM pure
form1 <- unrate ~ partrate + agri + cons +
                 pspl(serv,nknots=15) +
                 pspl(empgrowth,nknots=20)
gampure <- psar(form1,data=unemp_it)
summary(gampure)
library(mgcv)
form1_mgcv <- unrate ~ partrate + agri + cons +
                       s(serv,bs="ps",m=2,k=15) +
                       s(empgrowth,bs="ps",m=2,k=20)
gampure_mgcv <- gam(form1_mgcv,data=unemp_it,method="REML")
summary(gampure_mgcv)
 plot.gam(gampure_mgcv)
 ######################  Fit non-parametric terms (spatial trend must be name "spttrend")
 list_varnopar <- c("serv","empgrowth")
 terms_nopar <- fit_terms(gampure,list_varnopar)
 ######################  Plot non-parametric terms
 plot_terms(terms_nopar,unemp_it)
######################  GAM SAR
 gamsar <- psar(form1,data=unemp_it,sar=TRUE,Wsp=Wsp_it)
 summary(gamsar)
 ###### Non-Parametric Total, Direct and Indirect Effects
 list_varnopar <- c("serv","empgrowth")
 eff_nparvar <- eff_nopar(gamsar,list_varnopar)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE)
 ###################### Parametric Total, Direct and Indirect Effects
 list_varpar <- c("partrate","agri","cons")
 eff_parvar <- eff_par(gamsar,list_varpar)
 summary(eff_parvar)

 ###############################################
 ### Spatial semiparametric model without spatial lag
form2 <- unrate ~ partrate + agri + cons +
                pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                pspt(long,lat,nknots=c(20,20),psanova=FALSE)
 ### Spatial trend fixed for 2014
 unemp_it_2d <- unemp_it[unemp_it$year==2014,]
 geosp1 <- psar(form2,data=unemp_it_2d)
 summary(geosp1)
 ### Spatial trend fixed for period 1996- 2014
geosp2 <- psar(form2,data=unemp_it)
summary(geosp2)
 ###############################################
 ### Spatial semiparametric model with spatial lag
 ### Spatial trend fixed for 2014
geospar1 <- psar(form2,data=unemp_it_2d,Wsp=Wsp_it,sar=TRUE)
summary(geospar1)
 ### Spatial trend fixed for period 1996-2014
geospar2 <- psar(form2,data=unemp_it,Wsp=Wsp_it,sar=TRUE)
summary(geospar2)
 #### Fitted Values and Residuals
 plot(geospar2$fit,unemp_it$unrate,xlab='fitted values',ylab="unrate",
      type="p",cex.lab=1.3,cex.main=1.3,
      main="Spatial semiparametric model with spatial lag",
      sub="Spatial trend fixed for period 1996-2014")
      plot(geospar2$fit,geospar2$residuals,xlab='fitted values',
           ylab="residuals",type="p",cex.lab=1.3,cex.main=1.3,
           main="Spatial semiparametric model with spatial lag",
           sub="Spatial trend fixed for period 1996-2014")

 ### Fit non-parametric terms (spatial trend must be name "spttrend")
list_varnopar <- c("serv","empgrowth")
terms_nopar <- fit_terms(geospar2,list_varnopar)
### Plot non-parametric terms
 plot_terms(terms_nopar,unemp_it)
### Plot spatial trend (no ANOVA)
sptrend <- fit_terms(geospar2,"spttrend")
n <- dim(unemp_it)[1]
nT <- 19
sp_seq <- seq(from=1,to=n,by=nT)
lon <- scale(unemp_it[sp_seq,c("long")])
lat <- scale(unemp_it[sp_seq,c("lat")])
spt <- sptrend$fitted_terms[sp_seq]
require(rgl)
open3d(); plot3d(x=lon,y=lat,z=spt)
require(akima)
akima_spt <- akima::interp(x=lon, y=lat, z=spt,
             xo=seq(min(lon), max(lon), length = 1000),
             yo=seq(min(lat), max(lat), length = 1000))
persp3d(akima_spt$x,akima_spt$y,akima_spt$z,
        xlim=range(lon),ylim=range(lat),
        zlim=range(spt),col="green3",aspect="iso",add=TRUE)
 ###### Non-Parametric Total, Direct and Indirect Effects
 eff_nparvar <- eff_nopar(geospar2,list_varnopar)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE)
 ###### Parametric Total, Direct and Indirect Effects
 list_varpar <- c("partrate","agri","cons")
 eff_parvar <- eff_par(geospar2,list_varpar)
 summary(eff_parvar)

 ###############################################
 # Spatial semiparametric ANOVA model without spatial lag
 # Interaction term f12 with nested basis
form3 <- unrate ~ partrate + agri + cons +
                  pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                  pspt(long,lat,nknots=c(20,20),psanova=TRUE,
                  nest_sp1=c(1,2),nest_sp2=c(1,2))
# Spatial trend fixed for period 1996-2014
geospanova <- psar(form3,data=unemp_it)
summary(geospanova)
### Plot spatial trend (ANOVA)
spttrend <- fit_terms(geospanova,"spttrend")
lon <- scale(unemp_it$long); lat <- scale(unemp_it$lat)
### Plot main effects
plot_main_spt(spttrend,sp1=lon,sp2=lat,nT=19)
### Plot Interaction
nT <- 19
sp_seq <- seq(from=1,to=n,by=nT)
f12_int <- spttrend$fitted_terms[sp_seq,c("f12_int")]
lon <- scale(unemp_it[sp_seq,c("long")])
lat <- scale(unemp_it[sp_seq,c("lat")])
require(rgl)
open3d(); plot3d(x=lon,y=lat,z=f12_int)
require(akima)
akima_f12 <- akima::interp(x=lon, y=lat, z=f12_int,
                      xo=seq(min(lon), max(lon), length = 1000),
                      yo=seq(min(lat), max(lat), length = 1000))
persp3d(akima_f12$x,akima_f12$y,akima_f12$z,
        xlim=range(lon),ylim=range(lat),
        zlim=range(f12_int),col="green3",aspect="iso",add=TRUE)
# Spatial semiparametric ANOVA model with spatial lag
# Interaction term f12 with nested basis
geospanova_sar <- psar(form3,data=unemp_it,Wsp=Wsp_it,sar=TRUE)
summary(geospanova_sar)
 ###############################################
 # Spatio-temporal semiparametric ANOVA model without spatial lag
 # Interaction terms f12,f1t,f2t and f12t with nested basis
 # Remark: It is necessary to include ntime as argument
 # Remark: nest_sp1, nest_sp2 and nest_time must be divisors of nknots
 form4 <- unrate ~ partrate + agri + cons +
                   pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                   pspt(long,lat,year,nknots=c(18,18,8),psanova=TRUE,
                   nest_sp1=c(1,2,3),nest_sp2=c(1,2,3),
                   nest_time=c(1,2,2),ntime=19)
 sptanova <- psar(form4,data=unemp_it,
                  control=list(thr=1e-2,maxit=200,trace=FALSE))
 summary(sptanova)
 ### Plot spatial trend (ANOVA)
 spttrend <- fit_terms(sptanova,"spttrend")
 lon <- scale(unemp_it$long); lat <- scale(unemp_it$lat)
 time <- unemp_it$year
 ### Plot main effects
 plot_main_spt(spttrend,sp1=lon,sp2=lat,time=time,nT=19)

 ### Spatio-temporal semiparametric ANOVA model with spatial lag
 sptanova_sar <- psar(form4,data=unemp_it,Wsp=Wsp_it,sar=TRUE,
                      control=list(thr=1e-1,maxit=100,trace=FALSE))
 summary(sptanova_sar)
 ### Spatio-temporal semiparametric ANOVA model with spatial lag
 ### and temporal autorregresive noise
 sptanova_sar_ar1 <- psar(form4,data=unemp_it,Wsp=Wsp_it,sar=TRUE,ar1=TRUE,
                    control=list(thr=1e-1,maxit=200,trace=FALSE))
 summary(sptanova_sar_ar1)
 ###### Non-Parametric Total, Direct and Indirect Effects
 list_varnopar <- c("serv","empgrowth")
 eff_nparvar <- eff_nopar(sptanova_sar_ar1,list_varnopar)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE)
 ###### Parametric Total, Direct and Indirect Effects
 list_varpar <- c("partrate","agri","cons")
 eff_parvar <- eff_par(sptanova_sar_ar1,list_varpar)
 summary(eff_parvar)
 ###############################################
 # Spatio-temporal semiparametric ANOVA model without spatial lag
 # Interaction terms f12,f1t and f12t with nested basis
 # Interaction term f2t restricted to 0
  form5 <- unrate ~ partrate + agri + cons +
                  pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                  pspt(long,lat,year,nknots=c(18,18,6),psanova=TRUE,
                  nest_sp1=c(1,2,3),nest_sp2=c(1,2,3),
                  nest_time=c(1,2,2),f2t_int=FALSE,ntime=19)
 sptanova2 <- psar(form5,data=unemp_it,
                  control=list(thr=1e-2,maxit=200,trace=TRUE))
 summary(sptanova2)
 # Spatio-temporal semiparametric ANOVA model with spatial lag
 sptanova_sar2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,sar=TRUE,
                      control=list(thr=1e-2,maxit=100,trace=TRUE))
 summary(sptanova_sar2)
 # Spatio-temporal semiparametric ANOVA model with spatial lag
 # and temporal autorregresive noise
 sptanova_sar_ar1_2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,sar=TRUE,ar1=TRUE,
                   control=list(thr=1e-2,maxit=200,trace=TRUE))
 summary(sptanova_sar_ar1_2)
 # Spatio-temporal semiparametric ANOVA model without spatial lag
 # and temporal autorregresive noise
 sptanova_ar1_2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,ar1=TRUE,
                    control=list(thr=1e-2,maxit=200,trace=TRUE))
 summary(sptanova_ar1_2)
 ###############################################
 # Spatio-temporal semiparametric model (no ANOVA) without spatial lag
 # Remark: Be careful with the number of knots
 #    (spend a lot of time fitting...)
 form6 <- unrate ~ partrate + agri + cons +
                  pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                  pspt(long,lat,year,nknots=c(10,10,6),ntime=19)
 ## Not run: 
   spt <- psar(form6,data=unemp_it,Wsp=Wsp_it,
                    control=list(thr=1e-1,maxit=200,trace=TRUE))
   summary(spt) 
## End(Not run)

rominsal/sptpsar documentation built on June 1, 2022, 2:03 a.m.