In apeterson91/PTPT: PTPT

Modeling Framework

Before explaining the strategy behind each stage of data collection we introduce notation so that the technical reader can identify consistent terms across all stages of data collection and methods.

Consider the $i$ th PGH resident, $(i=1,...,N)$ who performs each of $K$ trips on a regular basis, say a typical week. We assume, to start, that the mode of transit is fixed per trip - a person only walks (for example) to the grocery store. The mode of transit for the $i$ th resident is denoted as $m_i$, where $m_i \in {1,...,M}$ and $M$ is the total possible modes of transit. This resident may also have certain characteristics, such as race, income, etc. that may be relevant to understanding their mode choice. We denote the vector of these $p$ characteristics as $Z_i \in \mathbb{R}^{p}$. Similarly, we measure route-specific characteristics, $X_i \in \mathbb{R}^q$, which may include the difference in expected travel time to a given destination between biking and driving.

With this preliminary notation aside, we now describe how these data are collected or generated in each of the three stages constituting the PTPT's development.

We follow previous work in the transit literature by simulating data under a discrete choice model framework [ben1985discrete; @paez2008discrete; goetzke2008]. Specifically we generate $m_i \in {\text{Bike},\text{Walk},\text{Drive}}$, where the odds of biking and walking to a destination are modeled on a logit scale as a function of both subject level characteristics, $Z_i$, and route specific characteristics $X_i$, which include the difference in estimated trip time duration between biking/walking and driving, or, in the future, the perceived physical difficulty of the route measured by e.g. average route gradient. The general model is then,

$$ m_i | X_i,Z_i \sim \text{Categorical}(\pi^{b}_i,\pi^w_i,\pi_i^d) \ \log(\frac{\pi_i^b}{\pi_i^d}) = \alpha_b + f_b(X_i) + g_b(Z_i) \ \log(\frac{\pi_i^w}{\pi_i^d}) = \alpha_w + f_w(X_i) + g_w(Z_i), $$

where $\pi_i^b,\pi_i^w,\pi_i^d$ are the respective probabilities of the $i$ th person biking, walking or driving to a destination, respectively, and $f_(X_i)$ -- the asterisk here refers to either the biking or walking function -- and $g_(Z_i)$ are the unknown functions of person and route specific traits that influence an individual's propensity to bike or walk instead of drive to a specific destination. For example, it is likely the case that one may be more inclined to bike or walk to a destination if the difference between walking and driving is small and the overall trip time is expected to be low, to reduce the hassle of driving and parking. However this inclination will rapidly decline as the difference increases and the overall absolute trip time increases. Consequently, we formulate this understanding by having $g_*(\cdot)$ be an exponential function of the aforementioned quantities.

For a full understanding of how the data are generated we encourage the interested reader to see the code in the data_raw/ folder in this project's github repo.

Known omissions

The modeling framework described above contains a number of known omissions we briefly enumerate here and leave for future version updates of stage 1 of the PTPT. The first is the omission of social dependency effects in the mean function above and explored in [@paez2008discrete; @goetzke2008]. Future versions of the PTPT will look to incorporate these effects.

While not explicitly an omission, a final weakness associated with simulated data we acknowledge is that the data are not observations of individuals making decisions about how to move about Pittsburgh. Our second stage dataset addresses this concern and builds upon the previous work laid out in stage 1.