knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7 ) options(scipen = 999)
In this vignette we calculate both the accessibility and the proposed spatial availability accessibility measures using the 2016 Transportation Tomorrow Survey (TTS) data for the Golden Horseshoe Area (GGH) in Ontario, Canada. We then explore these measures within the City of Toronto and compare the two calculated accessibility measures.
All data used is contained with the AccessPack
data-package and the spatial availability function (sp_avail
) is described in detail in the previous vignette.
Load packages needed for the example:
library(AccessPack) library(dplyr) library(ggplot2) library(kableExtra) library(patchwork) library(sf) # library(MASS) # library(logspline)
The 'ggh_taz' is a spatial object which contains the number of workers, jobs, and area (km^2) for each traffic analysis zone (TAZ), see the workers per area:
plot_ggh <- ggplot() + geom_sf(data = ggh_taz, aes(fill= workers/AREA), color = NA) + scale_fill_distiller(palette = "Spectral", trans="sqrt") + guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_ggh
The od_ft_tt
contains the trips made by workers to jobs (an Origin-Destination matrix). Each row contains the origin
ID for a TAZ, the destination
ID for a TAZ, the trips
which are the journeys from origin to destination, travel_time
is the travel time by car, as calculated using r5r
for a departure of 7:00am (EST5EDT) on October 20th 2021 (a date selected at random), f
is the impedance value. See 'data-raw' folder for data preparation .Rmd files.
junk <- od_ft_tt %>% select(trips, travel_time) %>% rename(Trips = trips, Travel_Time = travel_time) kable(junk %>% summary(), "html")
First, we create the Trip length distribution (TLD) by cutting the travel_time into 150 intervals and summing all trips within each interval. The TLD can be converted into a probability density function for likelihood of travel (as informed by travel time) by normalizing the the number of trips until a proportion of trips taken. This form of the TLD (black dots) and a theoretical gamma distribution (red line) which can be adjusted to fit the model are plotted below :
tld <- od_ft_tt %>% mutate(tt_classes = cut(travel_time, 150, ordered_result = TRUE)) %>% group_by(tt_classes) %>% summarize(trips = sum(trips), travel_time = mean(travel_time)) # Generate a random variates which follow a gamma distribution x <- rgamma(1000, shape = 2, rate = 0.1) # Covert to probability density function (PDF) den <- density(x) dat <- data.frame(x = den$x, y = den$y) # Plot density as points ggplot() + geom_point(data = dat, aes(x = x, y = y), size = 3, colour="red") + geom_point(data = tld, aes(x = travel_time, y = trips/sum(trips))) + ylim(0, 0.08) + theme_classic()
Let's see if gamma is the most appropriate distribution; to do this first a vector with every instance of travel_time for all ~3.0 mill trips is created:
# remove all NA trips from dataset and set all 0min travel times to 0.1min od_ft_tt <- od_ft_tt %>% filter( !is.na(travel_time)) %>% mutate(travel_time = ifelse(travel_time == 0, 0.1, travel_time)) all_tt <- od_ft_tt %>% dplyr::select(trips, travel_time) all_tt <- all_tt[rep(seq_len(dim(all_tt)[1]), all_tt$trips), 2]
Next this vector of travel_times are plotted in a Cullen and Frey graph:
fitdistrplus::descdist(data=all_tt)
We can see that from this skewness-kurtosis graph that the observation is most similar to a lognormal distribution but also the gamma. Let's fit try fitting the vector to these two distributions using maximum likelihood estimation and compare:
# using fitdist function to fit a distribution using the default maximum likelihood estimation method and Nelder-Mead method for direct optimization gamma_ <- fitdistrplus::fitdist(data=all_tt, "gamma", method="mle", optim.method="Nelder-Mead") lnorm_ <- fitdistrplus::fitdist(data=all_tt, "lnorm", method="mle", optim.method="Nelder-Mead") norm_ <-fitdistrplus::fitdist(data=all_tt, "norm", method="mle", optim.method="Nelder-Mead") exp_ <- fitdistrplus::fitdist(data=all_tt, "exp", method="mle", optim.method="Nelder-Mead")
Comparison of empirical distribution to gamma distribution:
plot(gamma_) # knitr::include_graphics("images/g-2.png")
Comparison of empirical distribution to lognormal distribution:
plot(lnorm_) # knitr::include_graphics("images/lno-3.png")
The gamma distribution fit looks more accurate; especially at predicting the longer travel times. Correspondingly, goodness of fit criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) for the gamma distribution (along with lognormal and exponential and normal distribution for reference) are calculated. The AIC and BIC measure the trade-off between the goodness of fit of the model and the simplicity of the model when comparing against candidate models; a model which looks like a good fit but has a high AIC and BIC value indicates issues of overfitting thus the lower the value the better the candidate model:
Distribution <- c("Gamma", "Exp", "Lognormal", "Normal") AIC <- c(gamma_$aic, exp_$aic, lnorm_$aic, norm_$aic) BIC <- c(gamma_$bic, exp_$bic, lnorm_$bic, norm_$bic) df <- data.frame(Distribution, AIC, BIC) kable(df, "html")
Let's plot the TLD with the gamma theoretical distribution:
ggplot(data = tld, aes(x = travel_time, y = trips/sum(trips))) + geom_point(size = 3) + geom_line(aes(x=travel_time, y=dgamma(travel_time, gamma_$estimate["shape"], gamma_$estimate["rate"])), color="red", size = 1) + theme_classic()
The fit looks good with exception to some outliers near the peak. Let's retain the parameters for this gamma function and use it as our impedance function for the TTS 2016 dataset:
kable(gamma_ %>% coef(), "html")
Gamma function: $$ f(x, \alpha, \beta) = \frac {x^{\alpha-1}e^{-x / \beta }} { \beta ^{\alpha}\Gamma(\alpha)} \quad \text{for } 0 \leq x \leq \infty $$ where the estimated 'shape' is $\alpha$,the estimated 'rate' is $\beta$, and the $\Gamma(\alpha) $ is defined as:
$$ \Gamma(\alpha) = \int_{0}^{\infty} x^{\alpha-1}e^{-x} \,dx \ $$
Let's now add the values of $f(x, \alpha, \beta)$ to the OD table as a column named 'f'. As a point of summary, 'f' captures our theoretical cost of travel (i.e. probability of travel based on the trip-length-distribution (a PDF) of empirical trips and associated travel time).
#dgamma is the density plot for gamma function od_ft_tt <- od_ft_tt %>% mutate(f = dgamma(travel_time, gamma_$estimate["shape"], gamma_$estimate["rate"]))
Using the TTS-16 dataset describe in the first vignette; let's calculate accessibility using the empirical dataset. The impedance function is already defined.
Calculate accessibility:
#add the number of jobs and workers to the od_ft_tt matrix od_ft <- od_ft_tt %>% merge(ggh_taz %>% dplyr::select(GTA06, workers) %>% st_drop_geometry(), by.x = "Origin", by.y="GTA06", all.x = TRUE) od_ft <- od_ft %>% merge(ggh_taz %>% dplyr::select(GTA06, jobs) %>% st_drop_geometry(), by.x = "Destination", by.y="GTA06", all.x = TRUE) # calculate accessibility c_accessibility <- od_ft %>% mutate(A_ij = f * jobs) %>% group_by(Origin) %>% summarise(A_i = sum(A_ij, na.rm = T), trips_i = sum(trips, na.rm = T))
Merge accessibly calculation to the ggh_taz:
ggh_taz_acc <- ggh_taz %>% merge(c_accessibility, by.x=c("GTA06"), by.y=c("Origin"), all.x=T)
Plot accessibility (accessible jobs in each TAZ):
plot_c_access <- ggplot() + geom_sf(data = ggh_taz_acc, aes(fill= A_i), color = NA) + scale_fill_distiller(palette = "Spectral", trans="sqrt")+ guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_c_access
NOTE: Grey TAZs contain no trips from origins thus have no calculated A_i
; 22% of TAZs are greyed out.
The impedance function is already defined, let's calculate spatial availability as described in the first vignette using the TTS-16 dataset :
od_ft <- od_ft %>% mutate(catch = 1) %>% mutate(V_ij = sp_avail(., o_id = Origin, d_id = Destination, pop = workers, opp = jobs, r = catch, f = f))
Verify that the sum of all jobs allocated is consistent with the total number of jobs. This property of proportional allocation is one of this measure's novel contribution:
sum(od_ft$V_ij, na.rm=T) sum_jobs <- od_ft %>% group_by(Destination) %>% summarise(jobs = mean(jobs)) sum(sum_jobs$jobs, na.rm = T)
The total number of jobs is preserved.
Aggregate available jobs ('V_ij') by origin:
availability <- od_ft %>% group_by(Origin) %>% summarize(V_i = sum(V_ij))
Now merge accessibly calculation to the ggh_taz:
ggh_taz_acc <- ggh_taz_acc %>% merge(availability, by.x=c("GTA06"), by.y=c("Origin"), all.x=T)
Plot spatial availability (accessible available jobs in each TAZ):
plot_avail <- ggplot() + geom_sf(data = ggh_taz_acc, aes(fill= V_i), color = NA) + scale_fill_distiller(palette = "Spectral", trans="sqrt")+ guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_avail
Plot origin accessibility vs. availability (with the City of Toronto Boundary):
plot_avail <- ggplot() + geom_sf(data = ggh_taz_acc, aes(fill= V_i), color = NA) + scale_fill_distiller(palette = "Spectral", trans="sqrt") + geom_sf(data = toronto_muni_bound, colour="black", fill=NA) + guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_c_access <- ggplot() + geom_sf(data = ggh_taz_acc, aes(fill= A_i), color = NA) + scale_fill_distiller(palette = "Spectral", trans="sqrt") + geom_sf(data = toronto_muni_bound, colour="black", fill=NA)+ guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_avail / plot_c_access
Let's take a closer look at the measures just within the City of Toronto boundary.
As many trips are made to Toronto from outside of Toronto; the analysis must be repeated within this new boundary.
Preparing Toronto spatial data:
#transform CRS toronto_muni_bound <- st_transform(toronto_muni_bound, crs=32617) #select only zones within Toronto Municipality TO_taz <- ggh_taz_acc %>% filter(st_intersects(., toronto_muni_bound, sparse = FALSE)[,1]) %>% dplyr::select(GTA06, AREA, jobs)
Only calculating for origins anywhere in the ggh to destinations in Toronto:
#jobs at destinations IN Toronto and origins from anywhere; workers are associated with the origin (ggh) and jobs with the destination (Toronto) TO_od_ft <- od_ft %>% dplyr::select(-V_ij) %>% filter(Destination %in% TO_taz$GTA06) #calculate accessibility for workers from any origin to jobs in Toronto TO_c_accessibility <- TO_od_ft %>% mutate(TO_A_ij = f * jobs) %>% group_by(Origin) %>% summarise(TO_A_i = sum(TO_A_ij, na.rm = T), trips_i = sum(trips, na.rm = T)) #Merge TO accessibly calculation to the ggh_taz: TO_taz_acc <- ggh_taz %>% merge(TO_c_accessibility, by.x=c("GTA06"), by.y=c("Origin"), all.y=T)
Again, only clculating for origins anywhere in the ggh to destinations in Toronto:
#calculate spatial availability TO_od_ft <- TO_od_ft %>% mutate(TO_V_ij = sp_avail(., o_id = Origin, d_id = Destination, pop = workers, opp = jobs, r = catch, f = f))
Verify that the sum of all jobs allocated is consistent with the total number of jobs. This property of proportional allocation is one of this measure's novel contribution:
sum(TO_od_ft$TO_V_ij, na.rm=T) sum_jobs <- TO_od_ft %>% group_by(Destination) %>% summarise(jobs = mean(jobs)) sum(sum_jobs$jobs, na.rm = T)
The total number of jobs is preserved.
Aggregate available jobs ('TO_V_ij') by origin and merge:
TO_availability <- TO_od_ft %>% group_by(Origin) %>% summarize(TO_V_i = sum(TO_V_ij)) #Merge TO availability calculation to the TAZ sf object created for accessibility above: TO_taz_acc <- TO_taz_acc %>% merge(TO_availability, by.x=c("GTA06"), by.y=c("Origin"), all.x=T)
Plot origin accessibility vs. availability for Toronto
plot_avail <- ggplot() + geom_sf(data = TO_taz_acc, aes(fill= TO_V_i), color = NA) + scale_fill_distiller(palette = "Spectral", trans = "sqrt")+ guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_c_access <- ggplot() + geom_sf(data = TO_taz_acc, aes(fill= TO_A_i), color = NA) + scale_fill_distiller(palette = "Spectral", trans = "sqrt")+ guides(size = "none") + theme(legend.position = "right", panel.grid = element_blank(), panel.background = element_rect(size = 1, color = "black", fill = NA)) plot_avail / plot_c_access
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.