spflow: Estimate spatial interaction models that incorporate spatial...

View source: R/spflow.R

spflowR Documentation

Estimate spatial interaction models that incorporate spatial dependence

Description

We implement three different estimators of spatial econometric interaction models \insertCiteDargel2021spflow that allow the user to estimate origin-destination flows with spatial autocorrelation.

By default the estimation will include spatial dependence in the dependent variable and the explanatory variables, which leads to the spatial Durbin model (SDM) \insertCiteAnselin1988spflow. Moreover, the model includes an additional set of parameters for intra regional flows that start and end in the same geographic site (as proposed by \insertCiteLeSage2009;textualspflow). Both default options can be deactivated via the estimation_control argument, which gives fine grained control over the estimation.

Usage

spflow(
  spflow_formula,
  spflow_networks,
  id_net_pair = id(spflow_networks)[["pairs"]][[1]],
  estimation_control = spflow_control()
)

Arguments

spflow_formula

A formula specifying the spatial interaction model (for details see section Formula interface)

spflow_networks

A spflow_network_multi() object that contains information on the origins, the destinations and their neighborhood structure

id_net_pair

A character indicating the id of a spflow_network_pair() (only relevant if the spflow_network_multi() contains multiple spflow_network_pair-objects: defaults to the of them)

estimation_control

A list generated by spflow_control() that provides fine grained control over the estimation procedure

Value

An S4 class of type spflow_model-class()

Details

Our estimation procedures makes use of the matrix formulation introduced by \insertCiteLeSage2008;textualspflow and further developed by \insertCiteDargel2021;textualspflow to reduce the computational effort and memory requirements. Further generalizations to deal with non-cartesian and rectangular flows are developed by \insertCiteDargel2023;textualspflow.

The estimation procedure can be adjusted through the estimation_method argument in spflow_control().

Maximum likelihood estimation (MLE)

Maximum likelihood estimation is the default estimation procedure. The matrix form estimation in the framework of this model was first developed by \insertCiteLeSage2008;textualspflow and then improved by \insertCiteDargel2021;textualspflow.

Spatial two-stage least squares (S2SLS)

The S2SLS estimator is an adaptation of the one proposed by \insertCiteKelejian1998;textualspflow, to the case of origin-destination flows, with up to three neighborhood matrices \insertCiteDargel2021;textualspflow. A similar estimation is done by \insertCiteTamesue2016;textualspflow. The user can activate the S2SLS estimation via the estimation_control argument using the input spflow_control(estimation_method = "s2sls").

Bayesian Markov Chain Monte Carlo (MCMC)

The MCMC estimator is based on the ideas of \insertCiteLeSage2009;textualspflow and incorporates the improvements proposed in \insertCiteDargel2021;textualspflow. The estimation is based on a tuned Metropolis-Hastings sampler for the auto-regressive parameters, and for the remaining parameters it uses Gibbs sampling. The routine uses 5500 iterations of the sampling procedure and considers the first 2500 as burn-in period. The user can activate the S2SLS estimation via the estimation_control argument using the input spflow_control(estimation_method = "mcmc").

Formula interface

The function offers a formula interface adapted to spatial interaction models, which has the following structure: Y ~ O_(X1) + D_(X2) + I_(X3) + P_(X4) This structure reflects the different data sources involved in such a model. On the left hand side there is the independent variable Y which corresponds to the vector of flows. On the right hand side we have all the explanatory variables. The functions O_(...) and D_(...) indicate which variables are used as characteristics of the origins and destinations respectively. Similarly, I_(...) indicates variables that should be used for the intra-regional parameters. Finally, P_(...) declares which variables describe origin-destination pairs, which most frequently will include a measure of distance.

All the declared variables must be available in the provided spflow_network_multi() object, which gathers information on the origins and destinations (inside spflow_network() objects), as well as the information on the origin-destination pairs (inside a spflow_network_pair() object).

Using the short notation Y ~ . is possible and will be interpreted as usual, in the sense that we use all variables that are available for each data source. Also mixed formulas, such as Y ~ . + P_(log(X4) + 1), are possible. When the dot shortcut is combined with explicit declaration, it will only be used for the non declared data sources. The following examples illustrate this behavior.

Formula interface (examples)

Consider the case where we have the flow vector Y and the distance vector DIST available as information on origin-destination pairs. In addition we have the explanatory variables ⁠X1, X2⁠ and X3 which describe the regions that are at the same time origins and destinations of the flows.

For this example the four formulas below are equivalent and make use of all explanatory variables ⁠X1, X2⁠ and X3 for origins, destinations and intra-regional observations.

  • Y ~ .

  • Y ~ . + P_(DIST)

  • Y ~ X1 + X2 + X3 + P_(DIST)

  • Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1 + X2 + X3) + P_(DIST)

Now if we only want to use X1 for the intra-regional model we can do the following (again all four options below are equivalent).

  • Y ~ . + I_(X1)

  • Y ~ . + I_(X1) + P_(DIST)

  • Y ~ X1 + X2 + X3 + I_(X1) + P_(DIST)

  • Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1) + P_(DIST)

This behavior is easily combined with transformation of variables as the two equivalent options below illustrate.

  • log(Y + 1) ~ sqrt(X1) + X2 + P_(log(DIST + 1))

Author(s)

Lukas Dargel

References

\insertAllCited

See Also

spflow_control() spflow_network_classes()

Examples


# Estimate flows between the states of Germany
spflow(spflow_formula = y9 ~ . + P_(DISTANCE),
       spflow_networks = multi_net_usa_ge,
       id_net_pair = "ge_ge")

# Same as above with explicit declaration of variables...
# ... X is the only variable available
# ... it is used for origins, destination and intra-state flows
spflow(spflow_formula = y9 ~ X + P_(DISTANCE),
       spflow_networks = multi_net_usa_ge,
       id_net_pair = "ge_ge")

# Same as above
spflow(spflow_formula = y9 ~ O_(.) + D_(.) + I_(.) + P_(DISTANCE),
       spflow_networks = multi_net_usa_ge,
       id_net_pair = "ge_ge")

# Same as above
spflow(spflow_formula = y9 ~ O_(X) + D_(X) + I_(X) + P_(DISTANCE),
       spflow_networks = multi_net_usa_ge,
       id_net_pair = "ge_ge")



LukeCe/spflow documentation built on Nov. 11, 2023, 8:20 p.m.