psar | R Documentation |
Estimate geoadditive spatial or spatio-temporal semiparametric PS-SAR regression models including parametric and non-parametric covariates, spatial or spatio-temporal non-parametric trends and spatial lags. The non-parametric terms (either trends or covariates) are modelled using P-Splines. The non-parametric trend can be decomposed in an ANOVA way including main and interactions effects of 2nd and 3rd order. For the spatial lag a SAR model is specified. The estimation method is Restricted Maximum Likelihood (REML).
psar(formula, data, Wsp = NULL, sar = FALSE, ar1 = FALSE, control = list(vary_init = NULL, thr = 0.001, maxit = 200, trace = FALSE, rho_init = 0, phi_init = 0, rho_fixed = FALSE, phi_fixed = FALSE), ...)
formula |
A formula similar to GAM specification including
parametric and non-parametric terms. Parametric covariates
are included in the usual way and non-parametric p-spline smooth terms are
specified using |
data |
A data frame containing the parametric and non-parametric
covariates included in the model. Also, if a |
Wsp |
A neighbour spatial matrix. The dimension of the matrix is always NxN, where N is the dimension of each spatial coordinate. Default NULL. |
sar |
A logical value used to include a spatial lag of the dependent variable in the model, that is, to estimate PS-SAR models. Default TRUE. |
ar1 |
A logical value indicating if the noise of spatio-temporal model follows a first order autoregressive process. Default FALSE. |
control |
List of extra control arguments - see section below |
Function to estimate the model:
y = (ρ*W_N \otimes I_T) y + X β + f(s_1,s_2,τ_t) + ∑_{i=1}^k g(z_i) + ε
where:
f(s_1,s_2,τ_t) is a smooth spatio-temporal trend of the spatial coordinates s1,s_2 and of the temporal coordinate τ_t.
X is a matrix including values of parametric covariates.
g(z_i) are non-parametric smooth functions of the covariates z_i.
W_N is the spatial weights matrix.
ρ is the spatial spillover parameter.
I_T is an identity matrix of order T (T=1 for pure spatial data).
ε ~ N(0,R) where R = σ^2 I_T if errors are uncorrelated or it follows an AR(1) temporal autoregressive structure for serially correlated errors.
The non-parametric terms are included in formula
using
pspt(.)
for spatial or spatio-temporal trends and pspl(.)
for other non-parametric smooth additive terms.
For example, if a model includes a spatio-temporal trend with long, and lat as spatial coordinates and year as temporal coordinate; two non-parametric covariates (empgrowth and serv) and three parametric covariates (partrate, agri and emphcons), the formula (choosing default values for each term) should be written as:
unrate ~ partrate + agri + cons +
pspl(serv) + pspl(empgrowth) +
pspt(long,lat,year)
For spatial trend case the term pspt(.)
does not include a temporal coordinate,
that is, in the previous example would be specified as pspt(long,lat)
.
In many situations the spatio-temporal trend given by f(s_1,s_2,τ_t) can be very complex and the use of a multidimensional smooth function may not be flexible enough to capture the structure in the data. Furthermore, the estimation of this trend can become computationally intensive especially for large databases. To solve this problem, Lee and Durbán (2011) proposed an ANOVA-type decomposition of this spatio-temporal trend where spatial and temporal main effects, and second- and third-order interaction effects can be identified as:
f(s_1,s_2,τ_t) = f_1(s_1) + f_2(s_2) + f_t(τ_t) + f_{1,2}(s_1,s_2) + f_{1,t}(s_1,τ_t) + f_{2,t}(s_2,τ_t) + f_{1,2,t}(s_1,s_2,τ_t)
In this equation f_1(s_1), f_2(s_2) and
f_t(τ_t) are the main effects,
f_{1,2}(s_1,s_2), f_{1,t}(s_1,τ_t) and
f_{2,t}(s_2,τ_t) are the
second-order interaction effects and
f_{1,2,t}(s_1,s_2,τ_t)
is the third-order interaction effect. In this case, each effect
can have its own degree of smoothing allowing a greater flexibility
for the spatio-temporal trend. The ANOVA decomposition of the trend
can be set as an argument in pspl(.)
terms choosing
psanova=TRUE
. For example to choose an ANOVA decomposition in
previous case we can set
pspt(long,lat,year,nknots=c(18,18,8),psanova=TRUE,ntime=19)
.
Note that for spatio-temporal data it is needed to specify the number
of temporal periods in ntime
but the number of spatial
coordinates are not needed. See Examples for more details.
In most empirical cases main effects are more flexible than interaction
effects and therefore, the number of knots in B-Spline basis for
interaction effects do not need to be as big as the number of knots for
main effects. Lee et al., (2013) proposed a nested basis procedure
in which the number of knots for the interaction effects are reduced using
divisors such that the space spanned by B-spline bases used for
interaction effects are a subset of the space spanned by B-spline bases
used for main effects. The divisors can be specified as an argument
in pspt(.)
terms. See Examples for details.
All the terms are jointly fitted using Separation of Anisotropic Penalties
(SAP) algorithm (see Rodríguez-Álvarez et al., (2015)) which allows
to the mixed model reparameterization of the model. When sar or
ar1 arguments are TRUE, ρ and φ parameters
are numerically estimated using function nloptr
implemented in pakage nloptr . In these cases, an iterative process between
SAP and numerical optimization of ρ and φ is applied until
convergence. See details in Mínguez et al., (2018).
A list object of class psar
call | Call of the function. |
terms | The terms object used. |
contrasts | (only where relevant) the contrasts used for parametric covariates. |
xlevels | (only where relevant) a record of the levels of the parametric factors used in fitting. |
edftot | Equivalent degrees of freedom for the whole model. |
edfspt | Equivalent degrees of freedom for smooth spatio-temporal or spatial trend. |
edfnopar | Equivalent degrees of freedom for non-parametric covariates. |
psanova | TRUE if spatio-temporal or spatial trend is PS-ANOVA. |
bfixed | Estimated betas corresponding to fixed effects in mixed model. These betas comes from either parametric covariates or fixed coefficients of smooth terms reparameterized as mixed models. |
se_bfixed | Standard errors of fixed betas. |
brandom | Estimated betas corresponding to random effects in mixed model. These betas comes from random coefficients of smooth terms reparameterized as mixed models. |
se_brandom | Standard errors of random betas. |
vcov_b | Covariance matrix of fixed and random effects. |
sar | TRUE if model is PS-SAR. |
rho | Estimated rho. If sar=FALSE always rho=0. |
se_rho | Standard error of rho. |
ar1 | TRUE if model has a temporal autoregressive of first order in residuals. |
phi | Estimated phi. If ar1=FALSE always phi=0. |
se_rho | Standard error of phi. |
fitted.values | Vector of fitted values of the dependent variable. |
se_fitted.values | Vector of standard errors of
fitted.values . |
fitted.values_Ay | Vector of fitted values of the spatial lag of dependent variable: (ρ*W_N \otimes I_T) y. |
se_fitted.values_Ay | Vector of standard errors of
fitted.values_Ay . |
residuals | Vector of residuals. |
residuals_norm | Uncorrelated residuals. For non-temporal
data they are the same than residuals . |
df.residual | Equivalent degrees of freedom for residuals . |
sig2 | Residual variance computed as SSR/df.residual. |
llik | Log-likelihood value. |
llik_reml | Restricted log-likelihood value. |
aic | Akaike information criterion. |
bic | Bayesian information criterion. |
sp1 | First spatial coordinate. |
sp2 | Second spatial coordinate. |
time | Time coordinate. |
y | Dependent variable. |
X | Model matrix for fixed effects. |
Z | Model matrix for random effects. |
vary_init | Initial value of the noise variance in the model. Default NULL. |
trace | A logical value set to TRUE to provide information about the convergence of the estimation process. Default FALSE. |
thr | Numerical value for the threshold of convergence in the estimation process. Default 1e-3. |
maxit | An integer value for the limit of the number of iterations during estimation process before reach convergence. Default 200. |
rho_init | An initial value for rho parameter if
sar=TRUE . Default 0. |
phi_init | An initial value for phi parameter if
ar1=TRUE . Default 0. |
rho_fixed | A logical value to set fixed rho parameter during estimation process. Default FALSE. |
phi_fixed | A logical value to set fixed phi parameter during estimation process. Default FALSE. |
Roman Minguez roman.minguez@uclm.es
Basile, R., Durbán, M., Mínguez, R., Montero, J. M., and Mur, J. (2014). Modeling regional economic dynamics: Spatial dependence, spatial heterogeneity and nonlinearities. Journal of Economic Dynamics and Control, (48), 229-245.
Eilers, P. and Marx, B. (1996). Flexible Smoothing with B-Splines and Penalties. Statistical Science, (11), 89-121.
Lee, D. and Durbán, M. (2011). P-Spline ANOVA Type Interaction Models for Spatio-Temporal Smoothing. Statistical Modelling, (11), 49-69.
Lee, D. J., Durban, M., and Eilers, P. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics & Data Analysis, (61), 22-37.
LeSage, J. and Pace, K. (2009). Introduction to Spatial Econometrics. CRC Press, Boca Raton.
Mínguez, R.; Basile, R. and Durbán, M. (2018). An Alternative Semiparametric Model for Spatial Panel Data. Under evaluation in Statistical Methods and Applications.
Montero, J., Mínguez, R., and Durbán, M. (2012). SAR models with nonparametric spatial trends: A P-Spline approach. Estadística Española, (54:177), 89-111.
Rodríguez-Alvarez, M. X.; Kneib, T.; Durban, M.; Lee, D.J. and Eilers, P. (2015). Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Statistics and Computing 25 (5), 941-957.
eff_par
compute total, direct and indirect effect
functions for parametric continuous covariates.
eff_nopar
compute total, direct and indirect effect
functions for non-parametric continuous covariates.
fit_terms
compute smooth functions for non-parametric
continuous covariates.
gam
well-known alternative of estimation
of semiparametric models in mgcv package.
################################################ ###################### Examples using a panel data of rate of ###################### unemployment for 103 Italian provinces in period 1996-2014. library(sptpsar) data(unemp_it); Wsp <- Wsp_it ###################### GAM pure form1 <- unrate ~ partrate + agri + cons + pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) gampure <- psar(form1,data=unemp_it) summary(gampure) library(mgcv) form1_mgcv <- unrate ~ partrate + agri + cons + s(serv,bs="ps",m=2,k=15) + s(empgrowth,bs="ps",m=2,k=20) gampure_mgcv <- gam(form1_mgcv,data=unemp_it,method="REML") summary(gampure_mgcv) plot.gam(gampure_mgcv) ###################### Fit non-parametric terms (spatial trend must be name "spttrend") list_varnopar <- c("serv","empgrowth") terms_nopar <- fit_terms(gampure,list_varnopar) ###################### Plot non-parametric terms plot_terms(terms_nopar,unemp_it) ###################### GAM SAR gamsar <- psar(form1,data=unemp_it,sar=TRUE,Wsp=Wsp_it) summary(gamsar) ###### Non-Parametric Total, Direct and Indirect Effects list_varnopar <- c("serv","empgrowth") eff_nparvar <- eff_nopar(gamsar,list_varnopar) plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE) plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE) ###################### Parametric Total, Direct and Indirect Effects list_varpar <- c("partrate","agri","cons") eff_parvar <- eff_par(gamsar,list_varpar) summary(eff_parvar) ############################################### ### Spatial semiparametric model without spatial lag form2 <- unrate ~ partrate + agri + cons + pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) + pspt(long,lat,nknots=c(20,20),psanova=FALSE) ### Spatial trend fixed for 2014 unemp_it_2d <- unemp_it[unemp_it$year==2014,] geosp1 <- psar(form2,data=unemp_it_2d) summary(geosp1) ### Spatial trend fixed for period 1996- 2014 geosp2 <- psar(form2,data=unemp_it) summary(geosp2) ############################################### ### Spatial semiparametric model with spatial lag ### Spatial trend fixed for 2014 geospar1 <- psar(form2,data=unemp_it_2d,Wsp=Wsp_it,sar=TRUE) summary(geospar1) ### Spatial trend fixed for period 1996-2014 geospar2 <- psar(form2,data=unemp_it,Wsp=Wsp_it,sar=TRUE) summary(geospar2) #### Fitted Values and Residuals plot(geospar2$fit,unemp_it$unrate,xlab='fitted values',ylab="unrate", type="p",cex.lab=1.3,cex.main=1.3, main="Spatial semiparametric model with spatial lag", sub="Spatial trend fixed for period 1996-2014") plot(geospar2$fit,geospar2$residuals,xlab='fitted values', ylab="residuals",type="p",cex.lab=1.3,cex.main=1.3, main="Spatial semiparametric model with spatial lag", sub="Spatial trend fixed for period 1996-2014") ### Fit non-parametric terms (spatial trend must be name "spttrend") list_varnopar <- c("serv","empgrowth") terms_nopar <- fit_terms(geospar2,list_varnopar) ### Plot non-parametric terms plot_terms(terms_nopar,unemp_it) ### Plot spatial trend (no ANOVA) sptrend <- fit_terms(geospar2,"spttrend") n <- dim(unemp_it)[1] nT <- 19 sp_seq <- seq(from=1,to=n,by=nT) lon <- scale(unemp_it[sp_seq,c("long")]) lat <- scale(unemp_it[sp_seq,c("lat")]) spt <- sptrend$fitted_terms[sp_seq] require(rgl) open3d(); plot3d(x=lon,y=lat,z=spt) require(akima) akima_spt <- akima::interp(x=lon, y=lat, z=spt, xo=seq(min(lon), max(lon), length = 1000), yo=seq(min(lat), max(lat), length = 1000)) persp3d(akima_spt$x,akima_spt$y,akima_spt$z, xlim=range(lon),ylim=range(lat), zlim=range(spt),col="green3",aspect="iso",add=TRUE) ###### Non-Parametric Total, Direct and Indirect Effects eff_nparvar <- eff_nopar(geospar2,list_varnopar) plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE) plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE) ###### Parametric Total, Direct and Indirect Effects list_varpar <- c("partrate","agri","cons") eff_parvar <- eff_par(geospar2,list_varpar) summary(eff_parvar) ############################################### # Spatial semiparametric ANOVA model without spatial lag # Interaction term f12 with nested basis form3 <- unrate ~ partrate + agri + cons + pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) + pspt(long,lat,nknots=c(20,20),psanova=TRUE, nest_sp1=c(1,2),nest_sp2=c(1,2)) # Spatial trend fixed for period 1996-2014 geospanova <- psar(form3,data=unemp_it) summary(geospanova) ### Plot spatial trend (ANOVA) spttrend <- fit_terms(geospanova,"spttrend") lon <- scale(unemp_it$long); lat <- scale(unemp_it$lat) ### Plot main effects plot_main_spt(spttrend,sp1=lon,sp2=lat,nT=19) ### Plot Interaction nT <- 19 sp_seq <- seq(from=1,to=n,by=nT) f12_int <- spttrend$fitted_terms[sp_seq,c("f12_int")] lon <- scale(unemp_it[sp_seq,c("long")]) lat <- scale(unemp_it[sp_seq,c("lat")]) require(rgl) open3d(); plot3d(x=lon,y=lat,z=f12_int) require(akima) akima_f12 <- akima::interp(x=lon, y=lat, z=f12_int, xo=seq(min(lon), max(lon), length = 1000), yo=seq(min(lat), max(lat), length = 1000)) persp3d(akima_f12$x,akima_f12$y,akima_f12$z, xlim=range(lon),ylim=range(lat), zlim=range(f12_int),col="green3",aspect="iso",add=TRUE) # Spatial semiparametric ANOVA model with spatial lag # Interaction term f12 with nested basis geospanova_sar <- psar(form3,data=unemp_it,Wsp=Wsp_it,sar=TRUE) summary(geospanova_sar) ############################################### # Spatio-temporal semiparametric ANOVA model without spatial lag # Interaction terms f12,f1t,f2t and f12t with nested basis # Remark: It is necessary to include ntime as argument # Remark: nest_sp1, nest_sp2 and nest_time must be divisors of nknots form4 <- unrate ~ partrate + agri + cons + pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) + pspt(long,lat,year,nknots=c(18,18,8),psanova=TRUE, nest_sp1=c(1,2,3),nest_sp2=c(1,2,3), nest_time=c(1,2,2),ntime=19) sptanova <- psar(form4,data=unemp_it, control=list(thr=1e-2,maxit=200,trace=FALSE)) summary(sptanova) ### Plot spatial trend (ANOVA) spttrend <- fit_terms(sptanova,"spttrend") lon <- scale(unemp_it$long); lat <- scale(unemp_it$lat) time <- unemp_it$year ### Plot main effects plot_main_spt(spttrend,sp1=lon,sp2=lat,time=time,nT=19) ### Spatio-temporal semiparametric ANOVA model with spatial lag sptanova_sar <- psar(form4,data=unemp_it,Wsp=Wsp_it,sar=TRUE, control=list(thr=1e-1,maxit=100,trace=FALSE)) summary(sptanova_sar) ### Spatio-temporal semiparametric ANOVA model with spatial lag ### and temporal autorregresive noise sptanova_sar_ar1 <- psar(form4,data=unemp_it,Wsp=Wsp_it,sar=TRUE,ar1=TRUE, control=list(thr=1e-1,maxit=200,trace=FALSE)) summary(sptanova_sar_ar1) ###### Non-Parametric Total, Direct and Indirect Effects list_varnopar <- c("serv","empgrowth") eff_nparvar <- eff_nopar(sptanova_sar_ar1,list_varnopar) plot_effects_nopar(eff_nparvar,unemp_it,smooth=TRUE) plot_effects_nopar(eff_nparvar,unemp_it,smooth=FALSE) ###### Parametric Total, Direct and Indirect Effects list_varpar <- c("partrate","agri","cons") eff_parvar <- eff_par(sptanova_sar_ar1,list_varpar) summary(eff_parvar) ############################################### # Spatio-temporal semiparametric ANOVA model without spatial lag # Interaction terms f12,f1t and f12t with nested basis # Interaction term f2t restricted to 0 form5 <- unrate ~ partrate + agri + cons + pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) + pspt(long,lat,year,nknots=c(18,18,6),psanova=TRUE, nest_sp1=c(1,2,3),nest_sp2=c(1,2,3), nest_time=c(1,2,2),f2t_int=FALSE,ntime=19) sptanova2 <- psar(form5,data=unemp_it, control=list(thr=1e-2,maxit=200,trace=TRUE)) summary(sptanova2) # Spatio-temporal semiparametric ANOVA model with spatial lag sptanova_sar2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,sar=TRUE, control=list(thr=1e-2,maxit=100,trace=TRUE)) summary(sptanova_sar2) # Spatio-temporal semiparametric ANOVA model with spatial lag # and temporal autorregresive noise sptanova_sar_ar1_2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,sar=TRUE,ar1=TRUE, control=list(thr=1e-2,maxit=200,trace=TRUE)) summary(sptanova_sar_ar1_2) # Spatio-temporal semiparametric ANOVA model without spatial lag # and temporal autorregresive noise sptanova_ar1_2 <- psar(form5,data=unemp_it,Wsp=Wsp_it,ar1=TRUE, control=list(thr=1e-2,maxit=200,trace=TRUE)) summary(sptanova_ar1_2) ############################################### # Spatio-temporal semiparametric model (no ANOVA) without spatial lag # Remark: Be careful with the number of knots # (spend a lot of time fitting...) form6 <- unrate ~ partrate + agri + cons + pspl(serv,nknots=15) + pspl(empgrowth,nknots=20) + pspt(long,lat,year,nknots=c(10,10,6),ntime=19) ## Not run: spt <- psar(form6,data=unemp_it,Wsp=Wsp_it, control=list(thr=1e-1,maxit=200,trace=TRUE)) summary(spt) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.