library(mobility)
library(ggplot2)
library(reshape2)
library(viridis)
library(ggstance)
library(foreach)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  autodep=TRUE
)

set.seed(1234)

Build data vectors

The travel probability model requires two pieces of information: the number of trips for the time interval that travel outside the origin location ($y_i$) and the total number of trips observed for that origin ($m_i$). Both vectors must be the same length but they can have missing values (NA) that represent an unobserved origin location. The get_stay_data() function can be used to build trips for each location that are aggregated to a desired spatial scale (e.g. admin level 1 or admin level 2).

stay_data <- get_stay_data(travel_data_sim, agg_adm=2)
y <- setNames(stay_data$travel, stay_data$orig_id)
m <- setNames(stay_data$total, stay_data$orig_id)

n_orig <- length(y)
miss <- sample(1:n_orig, n_orig*0.2) # missing observations
y[miss] <- m[miss] <- NA

y
m

Estimate probability of travel for each location

To estimate the probability of traveling outside the origin location $i$, the fit_prob_travel() function uses a Beta-Binomial model with hierarchical structure to infer travel probability in unobserved locations. $$ y_i \sim \text{Binom}(\tau_i, \sum_{\forall j} m_{ij}) $$ The random variable $y_i$ is the observed number of trips that leave origin $i$ within the time interval. Binomial parameters $\tau_i$ and $\sum_{\forall j} m_{ij}$ are the success probability and total number of observed trips emanating from origin $i$ respectively. $$ \begin{aligned} \tau_i &\sim \text{Beta}(1+\alpha, 1+\beta) \ \tau_\text{pop} &\sim \text{Beta}(1+\bar{\alpha}, 1+\bar{\beta}) \end{aligned} $$ Binomial probabilities for each origin $\tau_i$ are drawn from a Beta distributed prior with shape and rate parameters $\alpha$ and $\beta$. The hierarchical structure comes from estimating $\alpha$ and $\beta$ as population-level hyper-priors for the origin-level probabilities $\tau_i$ and allowing $\tau_\text{pop}$ to inherit the overall population-level distribution denoted as $\bar{\alpha}$ and $\bar{\beta}$. $$ \begin{aligned} \alpha &\sim \text{Gamma}(0.01, 0.01) \ \beta &\sim \text{Gamma}(0.01, 0.01) \end{aligned} $$

This structure allows origin locations to have probabilities $\tau_i$ which are driven by data in each location and all unobserved locations regress to the population mean $\tau_\text{pop}$.

prob_trav <- fit_prob_travel(travel=y, total=m)
prob_trav_smry <- summary(prob_trav, probs=c(0.025, 0.25, 0.75, 0.975))

ggplot(data=prob_trav_smry) +
  geom_point(aes(x=mean, y=names(y)), size=2) +
  ggstance::geom_linerangeh(aes(y=names(y), xmin=Q2.5, xmax=Q97.5)) +
  ggstance::geom_linerangeh(aes(y=names(y), xmin=Q25, xmax=Q75), size=1) +
  xlab('Probability of travel outside origin') + ylab('Origin') +
  xlim(0,1) +
  theme_bw() + theme(axis.text.x=element_text(size=10),
                     axis.text.y=element_text(size=10),
                     axis.title.x=element_text(size=12, margin = margin(t = 15)),
                     axis.title.y=element_text(size=12, margin = margin(r = 15)),
                     panel.border = element_rect(colour = "black", fill=NA, size=1),
                     legend.position='right')

Simulate values of travel probability for each location

There can be relatively large confidence intervals around the travel probability estimated for locations without data. Since these locations inherit the population-level mean, uncertainty will likely be higher than locations with data. If we wanted to then use this model in a simulation, we may want to include this uncertainty in the model.

The predict() function can be used to simulate the fitted values of travel probability for each location. The function takes posterior estimates of the mean ($\mu$) and standard deviation ($\sigma$) for each origin $i$, and then derives the shape and rate parameters for the Beta distribution. Simulated values are random draws from this Beta distribution.

# Simulate n realizations of travel probability for each location
predict(object=prob_trav, nsim=5, seed=123)


COVID-19-Mobility-Data-Network/mobility documentation built on Nov. 22, 2021, 12:17 a.m.