psar: Estimate spatial or spatio-temporal semiparametric PS-SAR...

View source: R/psar.R

psarR Documentation

Estimate spatial or spatio-temporal semiparametric PS-SAR regression models.

Description

Estimate geoadditive spatial or spatio-temporal semiparametric PS-SAR regression models including parametric and non-parametric covariates, spatial or spatio-temporal non-parametric trends and spatial lags. The non-parametric terms (either trends or covariates) are modelled using P-Splines. The non-parametric trend can be decomposed in an ANOVA way including main and interactions effects of 2nd and 3rd order. For the spatial lag a SAR model is specified. The estimation method is Restricted Maximum Likelihood (REML).

Usage

psar(formula, data, Wsp = NULL, sar = FALSE, ar1 = FALSE,
  control = list(vary_init = NULL, thr = 0.001, maxit = 200, trace =
  FALSE, rho_init = 0, phi_init = 0, rho_fixed = FALSE, phi_fixed = FALSE),
  ...)

Arguments

formula

A formula similar to GAM specification including parametric and non-parametric terms. Parametric covariates are included in the usual way and non-parametric p-spline smooth terms are specified using pspl(.) and pspt(.) for the non-parametric covariates and spatial or spatio-temporal trend, respectively. More details in Details and Examples.

data

A data frame containing the parametric and non-parametric covariates included in the model. Also, if a pspt(.) term is included in formula, the data frame must include the spatial and temporal coordinates specified in pspt(.). In this case coordinates must be ordered choosing time as fast index and spatial coordinates as low indexes. See head(unemp_it) as an example.

Wsp

A neighbour spatial matrix. The dimension of the matrix is always NxN, where N is the dimension of each spatial coordinate. Default NULL.

sar

A logical value used to include a spatial lag of the dependent variable in the model, that is, to estimate PS-SAR models. Default TRUE.

ar1

A logical value indicating if the noise of spatio-temporal model follows a first order autoregressive process. Default FALSE.

control

List of extra control arguments - see section below

Details

Function to estimate the model:

y = (ρ*W_N \otimes I_T) y + X β + f(s_1,s_2,τ_t) + ∑_{i=1}^k g(z_i) + ε

where:

  • f(s_1,s_2,τ_t) is a smooth spatio-temporal trend of the spatial coordinates s1,s_2 and of the temporal coordinate τ_t.

  • X is a matrix including values of parametric covariates.

  • g(z_i) are non-parametric smooth functions of the covariates z_i.

  • W_N is the spatial weights matrix.

  • ρ is the spatial spillover parameter.

  • I_T is an identity matrix of order T (T=1 for pure spatial data).

  • ε ~ N(0,R) where R = σ^2 I_T if errors are uncorrelated or it follows an AR(1) temporal autoregressive structure for serially correlated errors.

The non-parametric terms are included in formula using pspt(.) for spatial or spatio-temporal trends and pspl(.) for other non-parametric smooth additive terms.

For example, if a model includes a spatio-temporal trend with long, and lat as spatial coordinates and year as temporal coordinate; two non-parametric covariates (empgrowth and serv) and three parametric covariates (partrate, agri and emphcons), the formula (choosing default values for each term) should be written as:

unrate ~ partrate + agri + cons + pspl(serv) + pspl(empgrowth) + pspt(long,lat,year)

For spatial trend case the term pspt(.) does not include a temporal coordinate, that is, in the previous example would be specified as pspt(long,lat).

In many situations the spatio-temporal trend given by f(s_1,s_2,τ_t) can be very complex and the use of a multidimensional smooth function may not be flexible enough to capture the structure in the data. Furthermore, the estimation of this trend can become computationally intensive especially for large databases. To solve this problem, Lee and Durbán (2011) proposed an ANOVA-type decomposition of this spatio-temporal trend where spatial and temporal main effects, and second- and third-order interaction effects can be identified as:

f(s_1,s_2,τ_t) = f_1(s_1) + f_2(s_2) + f_t(τ_t) + f_{1,2}(s_1,s_2) + f_{1,t}(s_1,τ_t) + f_{2,t}(s_2,τ_t) + f_{1,2,t}(s_1,s_2,τ_t)

In this equation f_1(s_1), f_2(s_2) and f_t(τ_t) are the main effects, f_{1,2}(s_1,s_2), f_{1,t}(s_1,τ_t) and f_{2,t}(s_2,τ_t) are the second-order interaction effects and f_{1,2,t}(s_1,s_2,τ_t) is the third-order interaction effect. In this case, each effect can have its own degree of smoothing allowing a greater flexibility for the spatio-temporal trend. The ANOVA decomposition of the trend can be set as an argument in pspl(.) terms choosing psanova=TRUE. For example to choose an ANOVA decomposition in previous case we can set pspt(long,lat,year,nknots=c(18,18,8),psanova=TRUE,ntime=19). Note that for spatio-temporal data it is needed to specify the number of temporal periods in ntime but the number of spatial coordinates are not needed. See Examples for more details.

In most empirical cases main effects are more flexible than interaction effects and therefore, the number of knots in B-Spline basis for interaction effects do not need to be as big as the number of knots for main effects. Lee et al., (2013) proposed a nested basis procedure in which the number of knots for the interaction effects are reduced using divisors such that the space spanned by B-spline bases used for interaction effects are a subset of the space spanned by B-spline bases used for main effects. The divisors can be specified as an argument in pspt(.) terms. See Examples for details.

All the terms are jointly fitted using Separation of Anisotropic Penalties (SAP) algorithm (see Rodríguez-Álvarez et al., (2015)) which allows to the mixed model reparameterization of the model. When sar or ar1 arguments are TRUE, ρ and φ parameters are numerically estimated using function nloptr implemented in pakage nloptr . In these cases, an iterative process between SAP and numerical optimization of ρ and φ is applied until convergence. See details in Mínguez et al., (2018).

Value

A list object of class psar

call Call of the function.
terms The terms object used.
contrasts (only where relevant) the contrasts used for parametric covariates.
xlevels (only where relevant) a record of the levels of the parametric factors used in fitting.
edftot Equivalent degrees of freedom for the whole model.
edfspt Equivalent degrees of freedom for smooth spatio-temporal or spatial trend.
edfnopar Equivalent degrees of freedom for non-parametric covariates.
psanova TRUE if spatio-temporal or spatial trend is PS-ANOVA.
bfixed Estimated betas corresponding to fixed effects in mixed model. These betas comes from either parametric covariates or fixed coefficients of smooth terms reparameterized as mixed models.
se_bfixed Standard errors of fixed betas.
brandom Estimated betas corresponding to random effects in mixed model. These betas comes from random coefficients of smooth terms reparameterized as mixed models.
se_brandom Standard errors of random betas.
vcov_b Covariance matrix of fixed and random effects.
sar TRUE if model is PS-SAR.
rho Estimated rho. If sar=FALSE always rho=0.
se_rho Standard error of rho.
ar1 TRUE if model has a temporal autoregressive of first order in residuals.
phi Estimated phi. If ar1=FALSE always phi=0.
se_rho Standard error of phi.
fitted.values Vector of fitted values of the dependent variable.
se_fitted.values Vector of standard errors of fitted.values.
fitted.values_Ay Vector of fitted values of the spatial lag of dependent variable: (ρ*W_N \otimes I_T) y.
se_fitted.values_Ay Vector of standard errors of fitted.values_Ay.
residuals Vector of residuals.
residuals_norm Uncorrelated residuals. For non-temporal data they are the same than residuals.
df.residual Equivalent degrees of freedom for residuals.
sig2 Residual variance computed as SSR/df.residual.
llik Log-likelihood value.
llik_reml Restricted log-likelihood value.
aic Akaike information criterion.
bic Bayesian information criterion.
sp1 First spatial coordinate.
sp2 Second spatial coordinate.
time Time coordinate.
y Dependent variable.
X Model matrix for fixed effects.
Z Model matrix for random effects.

Control arguments

vary_init Initial value of the noise variance in the model. Default NULL.
trace A logical value set to TRUE to provide information about the convergence of the estimation process. Default FALSE.
thr Numerical value for the threshold of convergence in the estimation process. Default 1e-3.
maxit An integer value for the limit of the number of iterations during estimation process before reach convergence. Default 200.
rho_init An initial value for rho parameter if sar=TRUE. Default 0.
phi_init An initial value for phi parameter if ar1=TRUE. Default 0.
rho_fixed A logical value to set fixed rho parameter during estimation process. Default FALSE.
phi_fixed A logical value to set fixed phi parameter during estimation process. Default FALSE.

Author(s)

Roman Minguez roman.minguez@uclm.es

References

  • Basile, R., Durbán, M., Mínguez, R., Montero, J. M., and Mur, J. (2014). Modeling regional economic dynamics: Spatial dependence, spatial heterogeneity and nonlinearities. Journal of Economic Dynamics and Control, (48), 229-245.

  • Eilers, P. and Marx, B. (1996). Flexible Smoothing with B-Splines and Penalties. Statistical Science, (11), 89-121.

  • Lee, D. and Durbán, M. (2011). P-Spline ANOVA Type Interaction Models for Spatio-Temporal Smoothing. Statistical Modelling, (11), 49-69.

  • Lee, D. J., Durban, M., and Eilers, P. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics & Data Analysis, (61), 22-37.

  • LeSage, J. and Pace, K. (2009). Introduction to Spatial Econometrics. CRC Press, Boca Raton.

  • Mínguez, R.; Basile, R. and Durbán, M. (2018). An Alternative Semiparametric Model for Spatial Panel Data. Under evaluation in Statistical Methods and Applications.

  • Montero, J., Mínguez, R., and Durbán, M. (2012). SAR models with nonparametric spatial trends: A P-Spline approach. Estadística Española, (54:177), 89-111.

  • Rodríguez-Alvarez, M. X.; Kneib, T.; Durban, M.; Lee, D.J. and Eilers, P. (2015). Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Statistics and Computing 25 (5), 941-957.

See Also

  • eff_par compute total, direct and indirect effect functions for parametric continuous covariates.

  • eff_nopar compute total, direct and indirect effect functions for non-parametric continuous covariates.

  • fit_terms compute smooth functions for non-parametric continuous covariates.

  • gam well-known alternative of estimation of semiparametric models in mgcv package.

Examples

################################################
 ###################### Examples using a panel data of rate of
 ###################### unemployment for 103 Italian provinces in period 1996-2014.
library(sptpsar)
data(unemp_it); Wsp <- Wsp_it
 ######################  GAM pure
form1 <- unrate ~ partrate + agri + cons +
                 pspl(serv,nknots=15) +
                 pspl(empgrowth,nknots=20)
gampure <- psar(form1,data=unemp_it)
summary(gampure)
library(mgcv)
form1_mgcv <- unrate ~ partrate + agri + cons +
                       s(serv,bs="ps",m=2,k=15) +
                       s(empgrowth,bs="ps",m=2,k=20)
gampure_mgcv <- gam(form1_mgcv,data=unemp_it,method="REML")
summary(gampure_mgcv)
 plot.gam(gampure_mgcv)
 ######################  Fit non-parametric terms (spatial trend must be name "spttrend")
 list_varnopar <- c("serv","empgrowth")
 terms_nopar <- fit_terms(gampure,list_varnopar)
 ######################  Plot non-parametric terms
 plot_terms(terms_nopar,unemp_it)
######################  GAM SAR
 gamsar <- psar(form1,data=unemp_it,sar=TRUE,Wsp=Wsp_it)
 summary(gamsar)
 ###### Non-Parametric Total, Direct and Indirect Effects
 list_varnopar <- c("serv","empgrowth")
 eff_nparvar <- eff_nopar(gamsar,list_varnopar)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE)
 ###################### Parametric Total, Direct and Indirect Effects
 list_varpar <- c("partrate","agri","cons")
 eff_parvar <- eff_par(gamsar,list_varpar)
 summary(eff_parvar)

 ###############################################
 ### Spatial semiparametric model without spatial lag
form2 <- unrate ~ partrate + agri + cons +
                pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                pspt(long,lat,nknots=c(20,20),psanova=FALSE)
 ### Spatial trend fixed for 2014
 unemp_it_2d <- unemp_it[unemp_it$year==2014,]
 geosp1 <- psar(form2,data=unemp_it_2d)
 summary(geosp1)
 ### Spatial trend fixed for period 1996- 2014
geosp2 <- psar(form2,data=unemp_it)
summary(geosp2)
 ###############################################
 ### Spatial semiparametric model with spatial lag
 ### Spatial trend fixed for 2014
geospar1 <- psar(form2,data=unemp_it_2d,Wsp=Wsp_it,sar=TRUE)
summary(geospar1)
 ### Spatial trend fixed for period 1996-2014
geospar2 <- psar(form2,data=unemp_it,Wsp=Wsp_it,sar=TRUE)
summary(geospar2)
 #### Fitted Values and Residuals
 plot(geospar2$fit,unemp_it$unrate,xlab='fitted values',ylab="unrate",
      type="p",cex.lab=1.3,cex.main=1.3,
      main="Spatial semiparametric model with spatial lag",
      sub="Spatial trend fixed for period 1996-2014")
      plot(geospar2$fit,geospar2$residuals,xlab='fitted values',
           ylab="residuals",type="p",cex.lab=1.3,cex.main=1.3,
           main="Spatial semiparametric model with spatial lag",
           sub="Spatial trend fixed for period 1996-2014")

 ### Fit non-parametric terms (spatial trend must be name "spttrend")
list_varnopar <- c("serv","empgrowth")
terms_nopar <- fit_terms(geospar2,list_varnopar)
### Plot non-parametric terms
 plot_terms(terms_nopar,unemp_it)
### Plot spatial trend (no ANOVA)
sptrend <- fit_terms(geospar2,"spttrend")
n <- dim(unemp_it)[1]
nT <- 19
sp_seq <- seq(from=1,to=n,by=nT)
lon <- scale(unemp_it[sp_seq,c("long")])
lat <- scale(unemp_it[sp_seq,c("lat")])
spt <- sptrend$fitted_terms[sp_seq]
require(rgl)
open3d(); plot3d(x=lon,y=lat,z=spt)
require(akima)
akima_spt <- akima::interp(x=lon, y=lat, z=spt,
             xo=seq(min(lon), max(lon), length = 1000),
             yo=seq(min(lat), max(lat), length = 1000))
persp3d(akima_spt$x,akima_spt$y,akima_spt$z,
        xlim=range(lon),ylim=range(lat),
        zlim=range(spt),col="green3",aspect="iso",add=TRUE)
 ###### Non-Parametric Total, Direct and Indirect Effects
 eff_nparvar <- eff_nopar(geospar2,list_varnopar)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE)
 ###### Parametric Total, Direct and Indirect Effects
 list_varpar <- c("partrate","agri","cons")
 eff_parvar <- eff_par(geospar2,list_varpar)
 summary(eff_parvar)

 ###############################################
 # Spatial semiparametric ANOVA model without spatial lag
 # Interaction term f12 with nested basis
form3 <- unrate ~ partrate + agri + cons +
                  pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                  pspt(long,lat,nknots=c(20,20),psanova=TRUE,
                  nest_sp1=c(1,2),nest_sp2=c(1,2))
# Spatial trend fixed for period 1996-2014
geospanova <- psar(form3,data=unemp_it)
summary(geospanova)
### Plot spatial trend (ANOVA)
spttrend <- fit_terms(geospanova,"spttrend")
lon <- scale(unemp_it$long); lat <- scale(unemp_it$lat)
### Plot main effects
plot_main_spt(spttrend,sp1=lon,sp2=lat,nT=19)
### Plot Interaction
nT <- 19
sp_seq <- seq(from=1,to=n,by=nT)
f12_int <- spttrend$fitted_terms[sp_seq,c("f12_int")]
lon <- scale(unemp_it[sp_seq,c("long")])
lat <- scale(unemp_it[sp_seq,c("lat")])
require(rgl)
open3d(); plot3d(x=lon,y=lat,z=f12_int)
require(akima)
akima_f12 <- akima::interp(x=lon, y=lat, z=f12_int,
                      xo=seq(min(lon), max(lon), length = 1000),
                      yo=seq(min(lat), max(lat), length = 1000))
persp3d(akima_f12$x,akima_f12$y,akima_f12$z,
        xlim=range(lon),ylim=range(lat),
        zlim=range(f12_int),col="green3",aspect="iso",add=TRUE)
# Spatial semiparametric ANOVA model with spatial lag
# Interaction term f12 with nested basis
geospanova_sar <- psar(form3,data=unemp_it,Wsp=Wsp_it,sar=TRUE)
summary(geospanova_sar)
 ###############################################
 # Spatio-temporal semiparametric ANOVA model without spatial lag
 # Interaction terms f12,f1t,f2t and f12t with nested basis
 # Remark: It is necessary to include ntime as argument
 # Remark: nest_sp1, nest_sp2 and nest_time must be divisors of nknots
 form4 <- unrate ~ partrate + agri + cons +
                   pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                   pspt(long,lat,year,nknots=c(18,18,8),psanova=TRUE,
                   nest_sp1=c(1,2,3),nest_sp2=c(1,2,3),
                   nest_time=c(1,2,2),ntime=19)
 sptanova <- psar(form4,data=unemp_it,
                  control=list(thr=1e-2,maxit=200,trace=FALSE))
 summary(sptanova)
 ### Plot spatial trend (ANOVA)
 spttrend <- fit_terms(sptanova,"spttrend")
 lon <- scale(unemp_it$long); lat <- scale(unemp_it$lat)
 time <- unemp_it$year
 ### Plot main effects
 plot_main_spt(spttrend,sp1=lon,sp2=lat,time=time,nT=19)

 ### Spatio-temporal semiparametric ANOVA model with spatial lag
 sptanova_sar <- psar(form4,data=unemp_it,Wsp=Wsp_it,sar=TRUE,
                      control=list(thr=1e-1,maxit=100,trace=FALSE))
 summary(sptanova_sar)
 ### Spatio-temporal semiparametric ANOVA model with spatial lag
 ### and temporal autorregresive noise
 sptanova_sar_ar1 <- psar(form4,data=unemp_it,Wsp=Wsp_it,sar=TRUE,ar1=TRUE,
                    control=list(thr=1e-1,maxit=200,trace=FALSE))
 summary(sptanova_sar_ar1)
 ###### Non-Parametric Total, Direct and Indirect Effects
 list_varnopar <- c("serv","empgrowth")
 eff_nparvar <- eff_nopar(sptanova_sar_ar1,list_varnopar)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE)
 plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE)
 ###### Parametric Total, Direct and Indirect Effects
 list_varpar <- c("partrate","agri","cons")
 eff_parvar <- eff_par(sptanova_sar_ar1,list_varpar)
 summary(eff_parvar)
 ###############################################
 # Spatio-temporal semiparametric ANOVA model without spatial lag
 # Interaction terms f12,f1t and f12t with nested basis
 # Interaction term f2t restricted to 0
  form5 <- unrate ~ partrate + agri + cons +
                  pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                  pspt(long,lat,year,nknots=c(18,18,6),psanova=TRUE,
                  nest_sp1=c(1,2,3),nest_sp2=c(1,2,3),
                  nest_time=c(1,2,2),f2t_int=FALSE,ntime=19)
 sptanova2 <- psar(form5,data=unemp_it,
                  control=list(thr=1e-2,maxit=200,trace=TRUE))
 summary(sptanova2)
 # Spatio-temporal semiparametric ANOVA model with spatial lag
 sptanova_sar2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,sar=TRUE,
                      control=list(thr=1e-2,maxit=100,trace=TRUE))
 summary(sptanova_sar2)
 # Spatio-temporal semiparametric ANOVA model with spatial lag
 # and temporal autorregresive noise
 sptanova_sar_ar1_2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,sar=TRUE,ar1=TRUE,
                   control=list(thr=1e-2,maxit=200,trace=TRUE))
 summary(sptanova_sar_ar1_2)
 # Spatio-temporal semiparametric ANOVA model without spatial lag
 # and temporal autorregresive noise
 sptanova_ar1_2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,ar1=TRUE,
                    control=list(thr=1e-2,maxit=200,trace=TRUE))
 summary(sptanova_ar1_2)
 ###############################################
 # Spatio-temporal semiparametric model (no ANOVA) without spatial lag
 # Remark: Be careful with the number of knots
 #    (spend a lot of time fitting...)
 form6 <- unrate ~ partrate + agri + cons +
                  pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) +
                  pspt(long,lat,year,nknots=c(10,10,6),ntime=19)
 ## Not run: 
   spt <- psar(form6,data=unemp_it,Wsp=Wsp_it,
                    control=list(thr=1e-1,maxit=200,trace=TRUE))
   summary(spt) 
## End(Not run)



rominsal/sptpsar documentation built on June 1, 2022, 2:03 a.m.