knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The simR
package allows spatial interaction models (SIMs) to be fitted easily. In particular:
dplyr
'pipeline' approach.Firstly - load the package. Because of the 'tidyverse' focus of the package, load this as well:
library(simR) library(tidyverse)
This package essentially works with geographical data. Although one could use SIMs without drawing any maps, the author feels this is essentially lacking. For that reason, it is also worth loading the sf
and tmap
packages^[Install these packages if you haven't done already!].
library(sf) library(tmap)
Sample data for the package relates the Local Authority areas in the East Midlands of the UK. Here is the simplefeatures
table showing the areas:
data(eastmid_sf) tm_shape(eastmid_sf) + tm_borders()
There is also travel-to-work data - basically these are counts of the number of people travelling from one Local Authority area to another (or to itself). This is organised so that each row corresponds to a single origin-destination pair:
data(eastmid_ttw)
eastmid_ttw
The columns are:
From
: The name of the origin Local Authority area.To
: The name of the destination Local Authority area.Count
: The number of people travelling to work for the origin/destination pair.Dist
: The 'crow flies' diustancew between the origin and destination.Now fit a model:
eastmid_ttw %>% doubly_constrained_sim(From,To,Dist,Count)
At this stage, this might make little sense unless you are familiar withs SIMs. However, here are a few words of explanation. The doubly_constrained_sim
function fits a doubly constrained SIM model - this predicts the interaction between origin and destination - here this is the number of people travelling to work from the origin to the destation, ensuring that the total number of people leaving the origin, and the total number arriving at the destination agree with the observed figures, taking into account a 'cost deterrence' effect, where more costly trips are less likely to happen, given these constraints.
The function works like dplyr
type functions, so that the arguments can be specified as column names in a data frame, without having to quote them. It returns an object of class 'sim' (which inherits from the class 'glm').
The model is of the form
$$
\hat{T_{ij}} = k A_iB_j\exp\left(-\frac{ C_{ij}}{\gamma} \right)
$$
Where $A_i$ and $B_j$ are 'balancing terms' to make the constraints apply and $\gamma$ is a cost deterrence parameter. 'cost' here is used broadly, and can generally represent any measure of effort needed to travel from zone $i$ to zone $j$. $\hat{T_{ij}}$ is the interaction between zones $i$ and $j$. The 'hat' on the $\hat{T_{ij}}$ denotes that this is the modelled interaction - the actual interaction is $T_{ij}$ (no hat). This is the Count
column in the example data set eastmid_ttw
. $\gamma$ here corresponds to the 'Cost Denominator Parameter' in the printout. Since units here are in Km this is just over 8 Km. When the deterrence function is exponential, as in the model here, a 'half life' cost can be calculated. Assuming all other characteristics of an origin and destimation are fixed, for every whole number multiple of this cost the amount of interaction is halved. Here this is a distance - and has a value of a little under 6km.
To understand the idea more it is worth reading these two articles by Adam Dennett - https://rpubs.com/adam_dennett/259068 and https://rpubs.com/adam_dennett/257231 - in many ways these are what prompted me to create the simR
package.
Note that this is a 'toy' data set - ideally one would use more geographical detail, and a more realistic cost distance - such as travel time. Also, the cost when the origin and the destination are the same is generally not zero! However, the computing procedures are the same. In the fullness of time I'll replace these distances with something more convincing...
In the first section, a doubly constrained SIM model was fitted, but nothing really done with it. To do things with it, the object should be stored:
eastmid_ttw %>% doubly_constrained_sim(From,To,Dist,Count) -> sim_model
Then it is possible to add the fitted $\hat{T_{ij}}$ values to the eastmids_ttw
data frame.
eastmid_ttw %>% add_sim_fitted(sim_model) -> eastmid_ttw_fitted eastmid_ttw_fitted
We can now check that the calibration really is constrained to make the total sums going in to the destinations in the fitted values agree with those in the observed counts:
eastmid_ttw_fitted %>% group_by(To) %>% summarise(Count=sum(Count),Fitted=sum(Fitted))
It seems to! The same could be done for the origins.
Finally, make a map. Here we'll filter out the fitted trips to Derby and map them, and compare them to the observed trips.
eastmid_ttw_fitted %>% filter(To=='Derby') -> to_derby to_derby
To map this, we need to join the table to_derby
to the eastmid_sf
simple feature table.
eastmid_sf %>% left_join(to_derby,by=c("name"="From")) -> to_derby_sf
So now mapping is possible...
legend_style <- tm_legend(legend.position=c('right','bottom'), legend.text.size=0.4) tm_shape(to_derby_sf) + tm_polygons(col='Fitted',style='fisher') + legend_style tm_shape(to_derby_sf) + tm_polygons(col='Count',style='fisher') + legend_style
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.