Home-to-work commuting flows within the municipalities around Paris

old_opt <- options(max.print = 200)
  collapse = TRUE,
  comment = "#>"


This article illustrates the use of the spflow package for modeling spatial interactions using the example of home-to-work commuting flows. For our example we use information on the 71 municipalities that are located closest to the center of Paris. This data is contained in the package and was originally diffused by the French National Institutes of Statistics and Economic Studies (INSEE), and of Geographic and Forest Information (IGN). (For more information see help(paris_data).)


Data presentation

Each municipality is identified by a unique id. Additionally, we have information on the population, the median income and the number of companies.

drop_area <- names(paris10km_municipalities) != "AREA"

There are three different neighborhood matrices that can be used to describe the connectivity between the municipalities.

old_par <- par(mfrow = c(1, 3), mar = c(0,0,1,0))

mid_points <- suppressWarnings({

paris10km_nb <- list(
  "by_contiguity" = spdep::poly2nb(paris10km_municipalities),
  "by_distance" = spdep::dnearneigh(mid_points,d1 = 0, d2 = 5),
  "by_knn" = spdep::knn2nb(knearneigh(mid_points,3))

plot(paris10km_nb$by_contiguity, mid_points, add = T, col = rgb(0,0,0,alpha=0.5))

plot(paris10km_nb$by_distance,mid_points, add = T, col = rgb(0,0,0,alpha=0.5)) 

plot(paris10km_nb$by_knn, mid_points, add = T, col = rgb(0,0,0,alpha=0.5))
title("3 Nearest Neighbors") 


Finally, there is data on the size of the commuting flows and the distance between all pairs of municipalities.


Modeling Spatial interactions

The spflow package builds on the idea that flows correspond to pairwise interactions between the nodes of an origin network with the nodes of a destination network.

In our example, the origin and destination networks are the same because every municipality is both an origin and destination of a flow.

To estimate the model efficiently, the spflow package uses moment-based estimation methods, that exploit the relational structure of flow data. This avoids duplication arising from the fact that each municipality is at the origin and destination of many flows.

The network objects

To describe the nodes of a network the package provides sp_network_nodes-class that combines attributes of the nodes with the chosen network structure. For our model we choose the contiguity based neighborhood structure.

paris10km_net <- 
    network_id = "paris",
    node_neighborhood = nb2mat(paris10km_nb$by_contiguity),
    node_data = st_drop_geometry(paris10km_municipalities),
    node_key_column = "ID_MUN")


The sp_network_pair-class contains all information on the pairs of nodes belonging to the origin and destination networks.

paris10km_net_pairs <- sp_network_pair(
  orig_net_id = "paris",
  dest_net_id = "paris",
  pair_data = paris10km_commuteflows,
  orig_key_column = "ID_ORIG",
  dest_key_column = "ID_DEST")


The sp_multi_network-class combines information on the nodes and the node-pairs and also ensures that both data sources are consistent. For example, if some of the origins in the sp_network_pair-class are not identified with the nodes in the sp_network_nodes-class an error will be raised.

paris10km_multi_net <- sp_multi_network(paris10km_net,paris10km_net_pairs)


The core function of the package is spflow(), which provides an interface to four different estimators of the spatial econometric interaction model.

Estimating with default values

Estimation with default settings requires two arguments: an sp_multi_network-class and a flow_formula. The flow_formula specifies the model we want to estimate. In this example, the dependent variable is a transformation of commuting flows and we use the dot shortcut to indicate that all available variables should be included in the model. Using the defaults leads to the most comprehensive spatial interaction model, which includes spatial lags of the dependent variable, the exogenous variables and additional attributes for intra-regional observations.

results_default <- spflow(
  flow_formula = log(1 + COMMUTE_FLOW) ~ . + G_(log( 1 + DISTANCE)),
  sp_multi_network = paris10km_multi_net)


Adjusting the formula

We can adjust how the exogenous variables are to be used by wrapping them into the D_(), O_(), I_() and G_() functions. The variables in G_() are used as OD pair features and those in D_(), O_() and I_() are used as destination, origin and intra-regional features. We can take advantage of the formula interface to specify transformations and expand factor variables to dummies.

clog <- function(x) {
  log_x <- log(x)
  log_x - mean(log_x)

flow_formula  <- 
  log(COMMUTE_FLOW + 1) ~
  D_(log(NB_COMPANY) + clog(MED_INCOME)) +
  O_(log(POPULATION) + clog(MED_INCOME)) +
  I_(log(POPULATION)) +
  G_(log(DISTANCE + 1))

results_mle  <- spflow(

Adjustment through spflow_control

More fine-grained adjustments are possible via the spflow_control argument. Here we change the estimation method and the way we want to model the spatial autoregression in the flows. To use spatial lags only for certain variables, we need to specify them as a second formula.

sdm_formula <- ~
  O_(log(POPULATION) + clog(MED_INCOME)) +
  D_(log(NB_COMPANY) + clog(MED_INCOME))

cntrl <- spflow_control(
  estimation_method = "mcmc",
  sdm_variables = sdm_formula,
  model = "model_7") # restricts \rho_w = 0

results_mcmc  <- spflow(
  flow_control = cntrl)


Try the spflow package in your browser

Any scripts or data that you put into this service are public.

spflow documentation built on Sept. 9, 2021, 5:06 p.m.