knitr::opts_chunk$set(echo = TRUE, fig.height = 5, fig.width = 8, message = TRUE, warning = FALSE)
#load libraries #load strm library(strm) library(rmarkdown) #load package dependencies for the vignette library(tidyr) library(dplyr) library(rgdal) library(spdep) library(rgeos) library(sf) library(knitr) library(Ecdat)
strm
is an R
package that fits spatio-temporal regression model based on Chi & Zhu Spatial Regression Models for the Social Sciences (2019). The approach here fits a spatial error model while incorporating a temporally lagged response variable and temporally lagged explanatory variables.
This package builds on the errorsarlm()
function from the spatialreg
package.
This package is still under development. Please report bugs or constructive tips to issues here.
Fit a spatial error model but include a temporally lagged response variable, temporally lagged explanatory variable, and temporally and spatially lagged explanatory variables.
Spatial error model at time $t$:
$$Y_t = X_t \beta + u_t, u_t=\rho Wu_t + \varepsilon$$
Adding in a temporally lagged response variable and temporally lagged explanatory variable:
$$Y_t = X_t\beta_t + \beta_2Y_{t-1} + X_{t-1}\beta_3 + u_t, u_t=\rho Wu_t + \varepsilon$$
This becomes:
$$Y_t = Y_{t-1} \beta_2 + \rho W Y_t - \rho\beta_2 WY_{t-1} + X_t \beta_1 + X_{t-1}\beta_3 - W X_t \rho \beta_1 -WX_{t-1} \rho \beta_3 + \varepsilon$$
(This example has been adapted from the splm
package vignette.)
The first data set we will use is the Produc data set from the Ecdat
package. This data set is a panel of United States production data from 1970-1986. There are 816 observations by county in the United States. The variables we will use are:
year
: yearstate
: state in the United Statesgsp
:pcap
: private capital stock.pc
: public capital.emp
: labor input measured by employment in non-agricultural payrolls.unemp
: state unemployment rate.We also load usaww (originally from the splm
package), a spatial weights matrix of the 48 continental United States based on second-order neighbors.
#load data example data("Produc") data("usaww") #explore the data structures str(Produc) str(usaww)
We next convert the spatial weights matrix to a list of spatial weights using the mat2listw()
function from the spdep
package and check the class()
of the object:
#create list of spatial weights usalw <- mat2listw(usaww) class(usalw)
Next, we perform the spatio-temporal regression model. We first create a formula using as.formula()
. For this model, our response is log(gsp)
and our explanatory variables are log(pcap), log(pc), log(emp), unemp
.
For this analysis, we will limit the data to only 1970 and 1971 - two time periods.
We use the strm()
function from the strm
package. The first argument is the model formula. Given that the data is in long format, strm
will create the lagged values for you as well as the lagged response in the right-hand side of the model. The next argument is id
, which we set to "state" as each observation is taken at the state level. We then specify the name of the data set (data=Produc
), the list of spatial weights (listw = usalw
), time=2
, we tell strm
that the data is in long format by setting wide=FALSE
, and lastly we pass an argument to filter observations where year
is equal to either 1970 or 1971 (2 time periods).
formula1 <-as.formula(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp) out <- strm(formula1, id="state", data=Produc, listw = usalw, time=2,wide=FALSE, filter_options="year==1970 | year==1971") summary(out)
From the model summary output, we see that strm
has included lagged variables (*.Tlag1) for each of the explanatory variables initially specified (log(pcap), log(pc), log(emp), unemp
), as well as a lagged variable for the response (log(gsp)
).
As the number of time periods in the data increase, so do the number of lags. If we now use the same model, but instead include an additional year (1972) and set time=3
.
formula2 <-as.formula(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp) out <- strm(formula2, id="state", data=Produc, listw= usalw, time=3, wide=FALSE,filter_options="year==1970 | year==1971 | year ==1972")
After executing the strm()
model, we apply the summary()
function to the results object, out
.
summary(out)
From the model summary output, we see that strm
has now included two lagged variables (*.Tlag1 and *.Tlag2) for each of the explanatory variables initially specified, as well as two lags for the response variable.
We use the example from Spatial Regression Models for the Social Sciences Chi & Zhu (2019). The example uses population growth data from 2000 to 2010. Data are at the minor civil division (MCD) level in Wisconsin. There are two years of data: 2000 and 2010. The variables we will use are:
LNP1000
: population growth from 2000 to 2010.LNP0090
: population growth from 1990 to 2000.POLD00
: percentage of the old population (age sixty-five and older) in 2000.POLD90
: percentage of the old population (age sixty-five and older) in 1990.We load the sptdmg3
.RData file that comes with the strm
package using data(sptdmg3)
and explore its class()
and names()
:
data(sptdmg3) class(sptdmg3) names(sptdmg3)
First, fit a standard linear regression model with population growth from 2000 to 2010 as the response variable and population growth from 1990 to 2000, the old population in 2000, and the old population in 1990 as the explanatory variables.
formula2 <- as.formula(LNP1000 ~ LNP0090 + POLD00 + POLD90) m2 <- lm(formula2, data = sptdmg3) summary(m2)
We first convert the SpatialPolygonsDataFrame
to a simple features or sf
object. We use the st_as_sf()
function from the sf
package.
#convert to simple features sptdmg3_sf <- sf::st_as_sf(sptdmg3) class(sptdmg3_sf)
Next, we create a spatial weights matrix based on 4-nearest neighbors. We extract the coordinates from the MCD centroids using st_centroid()
to extract the centroid of each MCD and st_coordinates()
to extract the coordinates. We then use knearneigh()
from the spdep
package and specify k=4
to create a matrix of the 4 nearest neighbors; knn2nb()
creates a neighborhood structure object and nb2listw()
creates a spatial weights list, where we set the style to be row-standardized weights and the zero policy to be TRUE
. Finally, we plot the MCD and their neighbors.
#knn4 coords <-st_coordinates(st_centroid(sptdmg3_sf)) #extract centroid of each minor civil division col.knn <- knearneigh(coords, k=4) nbknn <- knn2nb(col.knn, row.names = sptdmg3$STFID) listwk <- nb2listw(nbknn, style="W") plot(sptdmg3, main="Wisconsin: 4 Nearest-Neighbors"); plot(nbknn, coords,col="forestgreen", add=TRUE)
We are now ready to perform the spatio-temporal regression model. In this example, the data is already in WIDE format, meaning that every variable-year is a column in the dataset. Because of this, we directly include the lagged variables for all of the covariates as well as the lagged variable for the response, just as we had done in the simple linear model above.
#since this data is in wide format, include temporal lag for both explanatory and response variable manually res <- strm(formula2, id="STFID", data=sptdmg3_sf, listw = listwk, time=2,wide=TRUE) summary(res)
moran.test(sptdmg3$LNP1000, listwk, alternative = "two.sided") moran.test(sptdmg3$LNP0090, listwk, alternative = "two.sided") moran.test(sptdmg3$POLD00, listwk, alternative = "two.sided") moran.test(sptdmg3$PUNMP00, listwk, alternative = "two.sided") moran.test(sptdmg3$ACSAIR, listwk, alternative = "two.sided") moran.test(sptdmg3$PFRST, listwk, alternative = "two.sided") moran.test(sptdmg3$DEVPB, listwk, alternative = "two.sided") formulatest <- as.formula(LNP1000 ~ LNP0090 + POLD00 + PUNMP00 + ACSAIR + PFRST + DEVPB ) m2 <- lm(formulatest, data = sptdmg3) summary(m2) moran.test(m2$residuals, listwk, alternative = "two.sided")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.