Nothing
#' @title
#' Estimate spatial interaction models that incorporate spatial dependence
#'
#' @description
#' We implement three different estimators of spatial econometric interaction
#' models \insertCite{Dargel2021}{spflow} that allow the user to estimate
#' origin-destination flows with spatial autocorrelation.
#'
#' By default the estimation will include spatial dependence in the dependent
#' variable and the explanatory variables which leads to the spatial Durbin
#' model (SDM) \insertCite{Anselin1988}{spflow}.
#' Moreover, the model includes an additional set of parameters for intra
#' regional flows that start and end in the same geographic site (as proposed
#' by \insertCite{LeSage2009;textual}{spflow}).
#' Both default options can be deactivated via the `flow_control` argument,
#' which gives fine grained control over the estimation.
#'
#'
#' @param flow_formula
#' A formula specifying the spatial interaction model (for details see
#' section Formula interface)
#' @param sp_multi_network
#' A [sp_multi_network()] object that contains information on the origins,
#' and the destinations and their neighborhood structure
#' @param network_pair_id
#' A character indicating the id of a [sp_network_pair()] (only relevant if
#' the [sp_multi_network()] contains multiple `sp_network_pair`-objects:
#' defaults to the of them)
#' @param flow_control
#' A list generated by [spflow_control()] that provides fine grained control
#' over the estimation procedure
#' @return An S4 class of type [spflow_model-class()]
#'
#'
#' @section Details:
#' Our estimation procedures makes use of the matrix formulation introduced by
#' \insertCite{LeSage2008;textual}{spflow} and further developed by
#' \insertCite{Dargel2021;textual}{spflow} to reduce the computational
#' effort and memory requirements.
#' The estimation procedure can be adjusted through the `estimation_method`
#' argument in [spflow_control()].
#'
#' ## Maximum likelihood estimation (MLE)
#' Maximum likelihood estimation is the default estimation procedure.
#' The matrix form estimation in the framework of this model was first
#' developed by \insertCite{LeSage2008;textual}{spflow} and then improved by
#' \insertCite{Dargel2021;textual}{spflow}.
#'
#' ## Spatial two-stage least squares (S2SLS)
#' The S2SLS estimator is an adaptation of the one proposed by
#' \insertCite{Kelejian1998;textual}{spflow}, to the case of origin-destination
#' flows, with up to three neighborhood matrices
#' \insertCite{Dargel2021;textual}{spflow}.
#' A similar estimation is done by \insertCite{Tamesue2016;textual}{spflow}.
#' The user can activate the S2SLS estimation via the `flow_control` argument
#' using the input `spflow_control(estimation_method = "s2sls")`.
#'
#' ## Bayesian Markov Chain Monte Carlo (MCMC)
#' The MCMC estimator is based on the ideas of
#' \insertCite{LeSage2009;textual}{spflow} and incorporates the improvements
#' proposed in \insertCite{Dargel2021;textual}{spflow}.
#' The estimation is based on a tuned Metropolis-Hastings sampler for the
#' auto-regressive parameters, and for the remaining parameters it uses Gibbs
#' sampling.
#' The routine uses 5500 iterations of the sampling procedure and considers the
#' first 2500 as burn-in period.
#' The user can activate the S2SLS estimation via the `flow_control` argument
#' using the input `spflow_control(estimation_method = "mcmc")`.
#'
#' ## Formula interface
#' The function offers a formula interface adapted to spatial interaction
#' models, which has the following structure:
#' `Y ~ O_(X1) + D_(X2) + I_(X3) + G_(X4)`
#' This structure reflects the different data sources involved in such a model.
#' On the left hand side there is the independent variable `Y` which
#' corresponds to the vector of flows.
#' On the right hand side we have all the explanatory variables.
#' The functions `O_(...)` and `D_(...)` indicate which variables are used as
#' characteristics of the origins and destinations respectively.
#' Similarly, `I_(...)` indicates variables that should be used for the
#' intra-regional parameters.
#' Finally, `G_(...)` declares which variables describe origin-destination
#' pairs, which most frequently will include a measure of distance.
#'
#' All the declared variables must be available in the provided
#' [sp_multi_network()] object, which gathers information on the origins and
#' destinations (inside [sp_network_nodes()] objects), as well as the
#' information on the origin-destination pairs (inside a [sp_network_pair()]
#' object).
#'
#' Using the short notation `Y ~ .` is possible and will be interpreted as
#' usual, in the sense that we use all variables that are available for each
#' data source.
#' Also mixed formulas, such as `Y ~ . + G_(log(X4) + 1)`, are possible.
#' When the dot shortcut is combined with explicit declaration, it will only be
#' used for the non declared data sources.
#' The following examples illustrate this behaviour.
#'
#' ## Formula interface (examples)
#'
#' Consider the case where we have the flow vector `Y` and the distance vector
#' `DIST` available as information on origin-destination pairs.
#' In addition we have the explanatory variables `X1, X2` and `X3` which
#' describe the regions that are at the same time origins and destinations of
#' the flows.
#'
#' For this example the four formulas below are equivalent and make use of all
#' explanatory variables `X1, X2` and `X3` for origins, destinations and
#' intra-regional observations.
#'
#' - `Y ~ .`
#' - `Y ~ . + G_(DIST)`
#' - `Y ~ X1 + X2 + X3 + G_(DIST)`
#' - `Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1 + X2 + X3) + G_(DIST)`
#'
#' Now if we only want to use X1 for the intra-regional model we can do the
#' following (again all four options below are equivalent).
#'
#' - `Y ~ . + I_(X1)`
#' - `Y ~ . + I_(X1) + G_(DIST)`
#' - `Y ~ X1 + X2 + X3 + I_(X1) + G_(DIST)`
#' - `Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1) + G_(DIST)`
#'
#' This behaviour is easily combined with transformation of variables as the
#' two equivalent options below illustrate.
#'
#' - `log(Y + 1) ~ sqrt(X1) + X2 + G_(log(DIST + 1))`
#' @examples
#'
#' # Estimate flows between the states of Germany
#' spflow(flow_formula = y9 ~ . + G_(DISTANCE),
#' sp_multi_network = multi_net_usa_ge,
#' network_pair_id = "ge_ge")
#'
#' # Same as above with explicit declaration of variables...
#' # ... X is the only variable available
#' # ... it is used for origins, destination and intra-state flows
#' spflow(flow_formula = y9 ~ X + G_(DISTANCE),
#' sp_multi_network = multi_net_usa_ge,
#' network_pair_id = "ge_ge")
#'
#' # Same as above
#' spflow(flow_formula = y9 ~ O_(.) + D_(.) + I_(.) + G_(DISTANCE),
#' sp_multi_network = multi_net_usa_ge,
#' network_pair_id = "ge_ge")
#'
#' # Same as above
#' spflow(flow_formula = y9 ~ O_(X) + D_(X) + I_(X) + G_(DISTANCE),
#' sp_multi_network = multi_net_usa_ge,
#' network_pair_id = "ge_ge")
#'
#'
#' @references \insertAllCited{}
#' @seealso [spflow_control()] [spflow_network_classes()]
#' @export
spflow <- function(
flow_formula,
sp_multi_network,
network_pair_id = id(sp_multi_network)[["network_pairs"]][[1]],
flow_control = spflow_control()
) {
## check for abusive inputs and correct ids
assert_is(flow_formula,"formula")
assert_is(sp_multi_network,"sp_multi_network")
pair_ids <- id(sp_multi_network)[["network_pairs"]]
assert(valid_network_pair_id(network_pair_id),
"The network_pair_id must be a character of length 1!")
assert(network_pair_id %in% pair_ids,
'The the network pair id "%s" is not available!',
network_pair_id)
## validate (by calling again) then enrich control parameters
flow_control <- do.call("spflow_control", flow_control)
estim_control <- c(
flow_control,
list("model_type" = sp_model_type(flow_control)),
matrix_form_control(pull_member(sp_multi_network, network_pair_id)))
# below cases should become available in future versions of the package
assert(is.null(estim_control$weight_variable), warn = TRUE,
"Weighting of observations is not yet possible and will be ignored.")
assert(estim_control$mat_complet == 1,
"Estimation is (for now) only possible if the number of pairs is " %p%
"excatly the number of origins multiplied by the number of " %p%
"destinations!")
assert(estim_control$mat_within,
"Estimation of flows between two diffrent networks are " %p%
"not yet available!")
formula_parts <- interpret_flow_formula(
flow_formula,
flow_control)
model_matrices <- spflow_model_matrix(
sp_multi_network,
network_pair_id,
formula_parts,
estim_control)
model_moments <- spflow_model_moments(
model_matrices = model_matrices,
estim_control = estim_control)
estimation_results <- spflow_model_estimation(
model_moments,
estim_control)
estimation_results <- add_details(
estimation_results,
model_matrices = model_matrices,
flow_control = flow_control,
model_moments = model_moments)
return(estimation_results)
}
#' @keywords internal
parameter_names <- function(
model_matrices,
model) {
names_rho <- define_spatial_lag_params(model)
names_const <- c("(Intercept)", "(Intra)")
use_const <- c(model_matrices$constants$global == 1,
!is.null(model_matrices$constants$intra$In))
names_const <- names_const[use_const]
x_prefs <- list("D_" = "DEST_","O_" = "ORIG_","I_" = "INTRA_")
names_X <- lapply(compact(model_matrices[names(x_prefs)]), "colnames")
names_X <- Map("%p%", x_prefs[names(names_X)], names_X)
names_X <- unlist(names_X, use.names = FALSE)
names_G <- names(model_matrices$G)
export_names <- c(names_rho,names_const,names_X,names_G)
return(export_names)
}
#' @keywords internal
drop_instruments <- function(model_matrices) {
# ... from constants
filter_inst <- function(x) Filter(function(x) !attr_inst_status(x), x)
constants <- list(
"global" = model_matrices$constants$global,
"intra" = filter_inst(model_matrices$constants$intra))
# ... from site attributes
filter_inst_col <- function(mat) mat[,!attr_inst_status(mat), drop = FALSE]
vector_treatment <- c("D","O","I") %p% "_"
matrices_X <- lapply(compact(model_matrices[vector_treatment]),
"filter_inst_col")
# ... from pair attributes
matrices_G <- lapply(model_matrices["G_"], "filter_inst")
# ... combine cleaned versions
matrices_and_spatial_weights <- c(
model_matrices["Y_"],
list("constants" = constants),
matrices_X,
matrices_G,
compact(model_matrices[c("DW","OW")])
)
return(matrices_and_spatial_weights)
}
#' @keywords internal
sp_model_type <- function(cntrl) {
has_lagged_x <- cntrl$sdm_variables != "none"
if (cntrl$model == "model_1")
model_type <- ifelse(has_lagged_x,"SLX","OLM")
if (cntrl$model != "model_1")
model_type <- ifelse(has_lagged_x,"SDM","LAG")
return(model_type)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.