old_opt <- options(max.print = 200) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library("spflow") library("sf") library("spdep")
This article illustrates the use of the spflow package for modeling spatial interactions using the example of home-to-work commuting flows.
For our example we use information on the 71 municipalities that are located closest to the center of Paris.
This data is contained in the package and was originally diffused by the French National Institutes of Statistics and Economic Studies (INSEE), and of Geographic and Forest Information (IGN).
(For more information see help(paris_data)
.)
data("paris10km_municipalities") data("paris10km_commuteflows")
Each municipality is identified by a unique id. Additionally, we have information on the population, the median income and the number of companies.
drop_area <- names(paris10km_municipalities) != "AREA" plot(paris10km_municipalities[drop_area])
There are three different neighborhood matrices that can be used to describe the connectivity between the municipalities.
old_par <- par(mfrow = c(1, 3), mar = c(0,0,1,0)) mid_points <- suppressWarnings({ st_point_on_surface(st_geometry(paris10km_municipalities))}) paris10km_nb <- list( "by_contiguity" = spdep::poly2nb(paris10km_municipalities), "by_distance" = spdep::dnearneigh(mid_points,d1 = 0, d2 = 5), "by_knn" = spdep::knn2nb(knearneigh(mid_points,3)) ) plot(st_geometry(paris10km_municipalities)) plot(paris10km_nb$by_contiguity, mid_points, add = T, col = rgb(0,0,0,alpha=0.5)) title("Contiguity") plot(st_geometry(paris10km_municipalities)) plot(paris10km_nb$by_distance,mid_points, add = T, col = rgb(0,0,0,alpha=0.5)) title("Distance") plot(st_geometry(paris10km_municipalities)) plot(paris10km_nb$by_knn, mid_points, add = T, col = rgb(0,0,0,alpha=0.5)) title("3 Nearest Neighbors") par(old_par)
Finally, there is data on the size of the commuting flows and the distance between all pairs of municipalities.
paris10km_commuteflows
The spflow package builds on the idea that flows correspond to pairwise interactions between the nodes of an origin network with the nodes of a destination network.
In our example, the origin and destination networks are the same because every municipality is both an origin and destination of a flow.
To estimate the model efficiently, the spflow package uses moment-based estimation methods, that exploit the relational structure of flow data. This avoids duplication arising from the fact that each municipality is at the origin and destination of many flows.
To describe the nodes of a network the package provides sp_network_nodes-class
that combines attributes of the nodes with the chosen network structure.
For our model we choose the contiguity based neighborhood structure.
paris10km_net <- sp_network_nodes( network_id = "paris", node_neighborhood = nb2mat(paris10km_nb$by_contiguity), node_data = st_drop_geometry(paris10km_municipalities), node_key_column = "ID_MUN") paris10km_net
The sp_network_pair-class
contains all information on the pairs of nodes belonging to the origin and destination networks.
paris10km_net_pairs <- sp_network_pair( orig_net_id = "paris", dest_net_id = "paris", pair_data = paris10km_commuteflows, orig_key_column = "ID_ORIG", dest_key_column = "ID_DEST") paris10km_net_pairs
The sp_multi_network-class
combines information on the nodes and the node-pairs and also ensures that both data sources are consistent.
For example, if some of the origins in the sp_network_pair-class
are not identified with the nodes in the sp_network_nodes-class
an error will be raised.
paris10km_multi_net <- sp_multi_network(paris10km_net,paris10km_net_pairs) paris10km_multi_net
The core function of the package is spflow()
, which provides an interface to four different estimators of the spatial econometric interaction model.
Estimation with default settings requires two arguments: an sp_multi_network-class
and a flow_formula
.
The flow_formula
specifies the model we want to estimate.
In this example, the dependent variable is a transformation of commuting flows and we use the dot shortcut to indicate that all available variables should be included in the model.
Using the defaults leads to the most comprehensive spatial interaction model, which includes spatial lags of the dependent variable, the exogenous variables and additional attributes for intra-regional observations.
results_default <- spflow( flow_formula = log(1 + COMMUTE_FLOW) ~ . + G_(log( 1 + DISTANCE)), sp_multi_network = paris10km_multi_net) results_default
We can adjust how the exogenous variables are to be used by wrapping them into the D_()
, O_()
, I_()
and G_()
functions.
The variables in G_()
are used as OD pair features and those in D_()
, O_()
and I_()
are used as destination, origin and intra-regional features.
We can take advantage of the formula interface to specify transformations and expand factor variables to dummies.
clog <- function(x) { log_x <- log(x) log_x - mean(log_x) } flow_formula <- log(COMMUTE_FLOW + 1) ~ D_(log(NB_COMPANY) + clog(MED_INCOME)) + O_(log(POPULATION) + clog(MED_INCOME)) + I_(log(POPULATION)) + G_(log(DISTANCE + 1)) results_mle <- spflow( flow_formula, paris10km_multi_net) results_mle
spflow_control
More fine-grained adjustments are possible via the spflow_control
argument.
Here we change the estimation method and the way we want to model the spatial autoregression in the flows.
To use spatial lags only for certain variables, we need to specify them as a second formula.
sdm_formula <- ~ O_(log(POPULATION) + clog(MED_INCOME)) + D_(log(NB_COMPANY) + clog(MED_INCOME)) cntrl <- spflow_control( estimation_method = "mcmc", sdm_variables = sdm_formula, model = "model_7") # restricts \rho_w = 0 results_mcmc <- spflow( flow_formula, paris10km_multi_net, flow_control = cntrl) results_mcmc
options(old_opt)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.