#' @title
#' Estimate spatial interaction models that incorporate spatial dependence
#'
#' @description
#' We implement three different estimators of spatial econometric interaction
#' models \insertCite{Dargel2021}{spflow} that allow the user to estimate
#' origin-destination flows with spatial autocorrelation.
#'
#' By default the estimation will include spatial dependence in the dependent
#' variable and the explanatory variables, which leads to the spatial Durbin
#' model (SDM) \insertCite{Anselin1988}{spflow}.
#' Moreover, the model includes an additional set of parameters for intra
#' regional flows that start and end in the same geographic site (as proposed
#' by \insertCite{LeSage2009;textual}{spflow}).
#' Both default options can be deactivated via the `estimation_control` argument,
#' which gives fine grained control over the estimation.
#'
#'
#' @param spflow_formula
#' A formula specifying the spatial interaction model (for details see
#' section Formula interface)
#' @param spflow_networks
#' A [spflow_network_multi()] object that contains information on the
#' origins, the destinations and their neighborhood structure
#' @param id_net_pair
#' A character indicating the id of a [spflow_network_pair()] (only relevant if
#' the [spflow_network_multi()] contains multiple `spflow_network_pair`-objects:
#' defaults to the of them)
#' @param estimation_control
#' A list generated by [spflow_control()] that provides fine grained control
#' over the estimation procedure
#' @return An S4 class of type [spflow_model-class()]
#'
#'
#' @section Details:
#' Our estimation procedures makes use of the matrix formulation introduced by
#' \insertCite{LeSage2008;textual}{spflow} and further developed by
#' \insertCite{Dargel2021;textual}{spflow} to reduce the computational
#' effort and memory requirements.
#' Further generalizations to deal with non-cartesian and rectangular flows
#' are developed by \insertCite{Dargel2023;textual}{spflow}.
#'
#' The estimation procedure can be adjusted through the `estimation_method`
#' argument in [spflow_control()].
#'
#' ## Maximum likelihood estimation (MLE)
#' Maximum likelihood estimation is the default estimation procedure.
#' The matrix form estimation in the framework of this model was first
#' developed by \insertCite{LeSage2008;textual}{spflow} and then improved by
#' \insertCite{Dargel2021;textual}{spflow}.
#'
#' ## Spatial two-stage least squares (S2SLS)
#' The S2SLS estimator is an adaptation of the one proposed by
#' \insertCite{Kelejian1998;textual}{spflow}, to the case of origin-destination
#' flows, with up to three neighborhood matrices
#' \insertCite{Dargel2021;textual}{spflow}.
#' A similar estimation is done by \insertCite{Tamesue2016;textual}{spflow}.
#' The user can activate the S2SLS estimation via the `estimation_control` argument
#' using the input `spflow_control(estimation_method = "s2sls")`.
#'
#' ## Bayesian Markov Chain Monte Carlo (MCMC)
#' The MCMC estimator is based on the ideas of
#' \insertCite{LeSage2009;textual}{spflow} and incorporates the improvements
#' proposed in \insertCite{Dargel2021;textual}{spflow}.
#' The estimation is based on a tuned Metropolis-Hastings sampler for the
#' auto-regressive parameters, and for the remaining parameters it uses Gibbs
#' sampling.
#' The routine uses 5500 iterations of the sampling procedure and considers the
#' first 2500 as burn-in period.
#' The user can activate the S2SLS estimation via the `estimation_control` argument
#' using the input `spflow_control(estimation_method = "mcmc")`.
#'
#' ## Formula interface
#' The function offers a formula interface adapted to spatial interaction
#' models, which has the following structure:
#' `Y ~ O_(X1) + D_(X2) + I_(X3) + P_(X4)`
#' This structure reflects the different data sources involved in such a model.
#' On the left hand side there is the independent variable `Y` which
#' corresponds to the vector of flows.
#' On the right hand side we have all the explanatory variables.
#' The functions `O_(...)` and `D_(...)` indicate which variables are used as
#' characteristics of the origins and destinations respectively.
#' Similarly, `I_(...)` indicates variables that should be used for the
#' intra-regional parameters.
#' Finally, `P_(...)` declares which variables describe origin-destination
#' pairs, which most frequently will include a measure of distance.
#'
#' All the declared variables must be available in the provided
#' [spflow_network_multi()] object, which gathers information on the origins and
#' destinations (inside [spflow_network()] objects), as well as the
#' information on the origin-destination pairs (inside a [spflow_network_pair()]
#' object).
#'
#' Using the short notation `Y ~ .` is possible and will be interpreted as
#' usual, in the sense that we use all variables that are available for each
#' data source.
#' Also mixed formulas, such as `Y ~ . + P_(log(X4) + 1)`, are possible.
#' When the dot shortcut is combined with explicit declaration, it will only be
#' used for the non declared data sources.
#' The following examples illustrate this behavior.
#'
#' ## Formula interface (examples)
#'
#' Consider the case where we have the flow vector `Y` and the distance vector
#' `DIST` available as information on origin-destination pairs.
#' In addition we have the explanatory variables `X1, X2` and `X3` which
#' describe the regions that are at the same time origins and destinations of
#' the flows.
#'
#' For this example the four formulas below are equivalent and make use of all
#' explanatory variables `X1, X2` and `X3` for origins, destinations and
#' intra-regional observations.
#'
#' - `Y ~ .`
#' - `Y ~ . + P_(DIST)`
#' - `Y ~ X1 + X2 + X3 + P_(DIST)`
#' - `Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1 + X2 + X3) + P_(DIST)`
#'
#' Now if we only want to use X1 for the intra-regional model we can do the
#' following (again all four options below are equivalent).
#'
#' - `Y ~ . + I_(X1)`
#' - `Y ~ . + I_(X1) + P_(DIST)`
#' - `Y ~ X1 + X2 + X3 + I_(X1) + P_(DIST)`
#' - `Y ~ D_(X1 + X2 + X3) + O_(X1 + X2 + X3) + I_(X1) + P_(DIST)`
#'
#' This behavior is easily combined with transformation of variables as the
#' two equivalent options below illustrate.
#'
#' - `log(Y + 1) ~ sqrt(X1) + X2 + P_(log(DIST + 1))`
#' @examples
#'
#' # Estimate flows between the states of Germany
#' spflow(spflow_formula = y9 ~ . + P_(DISTANCE),
#' spflow_networks = multi_net_usa_ge,
#' id_net_pair = "ge_ge")
#'
#' # Same as above with explicit declaration of variables...
#' # ... X is the only variable available
#' # ... it is used for origins, destination and intra-state flows
#' spflow(spflow_formula = y9 ~ X + P_(DISTANCE),
#' spflow_networks = multi_net_usa_ge,
#' id_net_pair = "ge_ge")
#'
#' # Same as above
#' spflow(spflow_formula = y9 ~ O_(.) + D_(.) + I_(.) + P_(DISTANCE),
#' spflow_networks = multi_net_usa_ge,
#' id_net_pair = "ge_ge")
#'
#' # Same as above
#' spflow(spflow_formula = y9 ~ O_(X) + D_(X) + I_(X) + P_(DISTANCE),
#' spflow_networks = multi_net_usa_ge,
#' id_net_pair = "ge_ge")
#'
#'
#' @references \insertAllCited{}
#' @seealso [spflow_control()] [spflow_network_classes()]
#' @author Lukas Dargel
#' @export
spflow <- function(
spflow_formula,
spflow_networks,
id_net_pair = id(spflow_networks)[["pairs"]][[1]],
estimation_control = spflow_control()) {
## check and correct input arguments
assert_is(spflow_formula, "formula")
assert_is_single_x(id_net_pair, "character")
od_ids <- id(pull_member(spflow_networks, id_net_pair))
estimation_control <- enhance_spflow_control(
estimation_control = estimation_control,
is_within = od_ids["orig"] == od_ids["dest"])
spflow_networks <- validate_networks(
spflow_networks = spflow_networks,
id_net_pair = id_net_pair,
model = estimation_control[["model"]])
spflow_matrices <- derive_spflow_matrices(
id_net_pair = id_net_pair,
spflow_networks = spflow_networks,
spflow_formula = spflow_formula,
spflow_control = estimation_control,
na_rm = estimation_control[["na_rm"]])
spflow_indicators <- spflow_matrices[["spflow_indicators"]]
spflow_matrices[["spflow_indicators"]] <- NULL
spflow_nbfunctions <- derive_spflow_nbfunctions(
OW = neighborhood(spflow_networks, od_ids["orig"]),
DW = neighborhood(spflow_networks, od_ids["dest"]),
estimation_control = estimation_control,
spflow_indicators = spflow_indicators)
spflow_obs <- spflow_indicators2obs(spflow_indicators)
spflow_moments <- derive_spflow_moments(
spflow_matrices = spflow_matrices,
n_o = spflow_obs[["N_orig"]],
n_d = spflow_obs[["N_dest"]],
N = spflow_obs[["N_sample"]],
wt = spflow_indicators2wtmat(spflow_indicators))
estimation_results <- spflow_estimation(
spflow_moments = spflow_moments,
spflow_nbfunctions = spflow_nbfunctions,
estimation_control = estimation_control)
estimation_results <- spflow_post_estimation(
estimation_results,
spflow_networks = spflow_networks,
spflow_matrices = spflow_matrices,
spflow_moments = spflow_moments,
spflow_indicators = spflow_indicators,
spflow_formula = spflow_formula)
return(estimation_results)
}
#' @keywords internal
parameter_names <- function(
model_matrices,
model) {
names_rho <- define_spatial_lag_params(model)
names_const <- c("(Intercept)", "(Intra)")
use_const <- c(model_matrices$constants$global == 1,
!is.null(model_matrices$constants$intra$In))
names_const <- names_const[use_const]
x_prefs <- list("D_" = "DEST_","O_" = "ORIG_","I_" = "INTRA_")
names_X <- lapply(compact(model_matrices[names(x_prefs)]), "colnames")
names_X <- Map("paste0", x_prefs[names(names_X)], names_X)
names_X <- unlist(names_X, use.names = FALSE)
names_G <- names(model_matrices$G)
export_names <- c(names_rho,names_const,names_X,names_G)
return(export_names)
}
#' @keywords internal
drop_instruments <- function(model_matrices) {
filter_inst_list <-
function(l) { Filter(Negate(attr_inst_status), model_matrices[[mm]]) }
mm <- "CONST"
model_matrices[[mm]] <- filter_inst_list(model_matrices[[mm]])
mm <- "P_"
model_matrices[[mm]] <- filter_inst_list(model_matrices[[mm]])
filter_inst_mat <-
function(mat) {
if (is.null(mat))
return(NULL)
is_inst <- attr_inst_status(mat)
mat <- mat[,!is_inst, drop = FALSE]
attr_inst_status(mat) <- is_inst[!is_inst]
mat
}
mm <- "D_"
model_matrices[[mm]] <- filter_inst_mat(model_matrices[[mm]])
mm <- "O_"
model_matrices[[mm]] <- filter_inst_mat(model_matrices[[mm]])
mm <- "I_"
model_matrices[[mm]] <- filter_inst_mat(model_matrices[[mm]])
return(model_matrices)
}
#' @keywords internal
sp_model_type <- function(cntrl) {
has_lagged_x <- cntrl$sdm_variables != "none"
if (cntrl$model == "model_1")
model_type <- ifelse(has_lagged_x,"SLX","OLM")
if (cntrl$model != "model_1")
model_type <- ifelse(has_lagged_x,"SDM","LAG")
return(model_type)
}
#' @keywords internal
derive_spflow_nbfunctions <- function(
OW,
DW,
estimation_control,
spflow_indicators) {
if (estimation_control[["model"]] == "model_1")
return(NULL)
pspace_validator <- derive_pspace_validator(
OW_character = attr_spectral_character(OW),
DW_character = attr_spectral_character(DW),
model = estimation_control[["model"]],
estimation_method = estimation_control[["estimation_method"]])
if (estimation_control[["estimation_method"]] == "s2sls")
return(list("pspace_validator" = pspace_validator))
logdet_calculator <- derive_logdet_calculator(
OW = OW,
DW = DW,
model = estimation_control[["model"]],
n_o = nlevels(spflow_indicators[[2]]),
n_d = nlevels(spflow_indicators[[1]]),
approx_order = estimation_control[["logdet_approx_order"]],
M_indicator = spflow_indicators2mat(spflow_indicators, "HAS_ZZ"))
return(list("logdet_calculator" = logdet_calculator,
"pspace_validator" = pspace_validator))
}
#' @keywords internal
derive_pspace_validator <- function(
OW_character,
DW_character,
model,
estimation_method) {
model_num <- as.numeric(substr(model,7,7))
if (model_num == 1)
return(NULL)
req_OW <- model_num %in% c(3:9)
req_DW <- model_num %in% c(2,4:9)
check <- "
The validity of the the parameter space cannot be validated because the
neighborhood matrix has complex eigen values."
has_complex_evs <- function(x) any(Im(x[c("LR","SR")]) != 0)
if ((req_OW & has_complex_evs(OW_character)) ||
(req_DW & has_complex_evs(DW_character))) {
assert(FALSE, check, warn = TRUE)
return(function(...) FALSE)
}
DWmax <- as.numeric(Re(DW_character["LR"])) %T% req_DW
DWmin <- as.numeric(Re(DW_character["SR"])) %T% req_DW
OWmax <- as.numeric(Re(OW_character["LR"])) %T% req_OW
OWmin <- as.numeric(Re(OW_character["SR"])) %T% req_OW
mod_params <- switch(model, "model_5" = "model_7", "model_6" = "model_9", model)
rho_names <- define_spatial_lag_params(mod_params)
rho_scale <- switch(model, "model_5" = 1/2, "model_6" = 1/3, 1)
WF_eigen_part <- rbind(
"max_max" = c("rho_d" = DWmax, "rho_o" = OWmax, "rho_w" = OWmax*DWmax),
"max_min" = c("rho_d" = DWmax, "rho_o" = OWmin, "rho_w" = OWmin*DWmax),
"min_max" = c("rho_d" = DWmin, "rho_o" = OWmax, "rho_w" = OWmax*DWmin),
"min_min" = c("rho_d" = DWmin, "rho_o" = OWmin, "rho_w" = OWmin*DWmin)
)[,rho_names, drop = FALSE]
validate_fun <- function(rho){
rho <- lookup(values = rho * rho_scale, names = rho_names) # trick for model 5 and 6
WF_eigen_range <- range(rowSums(WF_eigen_part %*% diag(rho, length(rho))))
valid_upper_bound <- WF_eigen_range[2] < 1
valid_lower_bound <- WF_eigen_range[1] > -1 | estimation_method == "s2sls"
return(valid_upper_bound & valid_lower_bound)
}
return(validate_fun)
}
#' @keywords internal
validate_networks <- function(
spflow_networks,
id_net_pair,
model,
do_normalisation) {
assert_is_single_x(id_net_pair, "character")
assert_is(spflow_networks, "spflow_network_multi")
assert(id_net_pair %in% id(spflow_networks)[["pairs"]],
'The the spflow_network_pair with id "%s" are not available!',
id_net_pair)
od_ids <- id(spflow_networks@pairs[[id_net_pair]])
remove_pairs <- id(spflow_networks)[["pairs"]] != id_net_pair
remove_nodes <- !id(spflow_networks)[["nodes"]] %in% od_ids[c("orig", "dest")]
spflow_networks@pairs[remove_pairs] <- NULL
spflow_networks@nodes[remove_nodes] <- NULL
check_msg <- "For model_%s the neighborhood of %s must be available!"
model_num <- as.numeric(substr(model,7,7))
miss_OW <- model_num %in% c(3:9) & is.null(neighborhood(spflow_networks, od_ids["orig"]))
miss_DW <- model_num %in% c(2,4:9) & is.null(neighborhood(spflow_networks, od_ids["dest"]))
assert(!miss_OW, check_msg, model_num, od_ids["orig"])
assert(!miss_DW, check_msg, model_num, od_ids["dest"])
return(spflow_networks)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.