In apeterson91/PTPT: PTPT

# Need to fix toc formatting - probably in css or header/footer files
library(dplyr)
library(gt)

NOTE

This documentation is still under review. Be aware that the content may change. Check the github commit history for this project's repo to see when changes are made.

Introduction

This document discusses various topics underlying the Pittsburgh Transit Propensity Tool (PTPT) which fall under two headings: (1) Data Collection and (2) Substantive Questions. Data collection is laid out in three stages with increasing difficulty and correspondingly higher quality data associated with each successive stage. Methods for investigating substantive questions develop in parallel to Data Collection; models and estimates discussed under this heading are designed to be available with minor adjustment for each of the three data stages. These methods are understood to be open ended and falling under two headings: (1) Descriptive and (2) Prescriptive. That is, methods are developed with the intent to describe the current state of transit use in Pittsburgh, as well as to "prescribe" in decision theoretic terms, what actions (interventions, programs, etc.) may lead to a greater uptake in one mode of transit over another. These ideas are discussed at a high level here with an emphasis on clarity and accessibility over technical rigor. We provide more in-depth information laid out in the sub sections of the "Analysis" tab.

Data Collection

Stage 1: Simulated Data

The first stage of data "collection" for the PTPT is generating simulated data. There are several reasons for this. To begin with, simulating data is cheap and quick while still providing some semblance of realism. Indeed, using the wealth of data sources available through the WPRDC as well as American Community Survey data available through the census, open source data available through openstreetmap, elevation data provided by NASA, and more that can be seen in the table in the Data Sources section below, simulated data offer us the chance to quickly summon various scenarios of how transit behavior may vary amongst Pittsburgh residents. To this end, we sample 100 fictitious residents across the 90 neighborhoods of Pittsburgh, assign them a random residential parcel and generate three trips - grocery, commute and social - by three modes of transit - walking, biking and driving a personal vehicle. From these we then generate a mode of transit using a discrete choice model. See the Analysis Tab for further details.

Further work in generating data will incorporate further modes of transit, and hypothetical modes of data collection that account for shared household trips, e.g. two subject participants make two different commute trips, but one shared grocery trip. These two will be of primary focus to better reflect both the transit reality of Pittsburghers as well as the likely reality of future data collection - clustered sampling - to which we now turn in greater detail in the next section.

Stage 2: Convenience Sample

Data collected in stage 2 represent a convenience sample drawn from members of Bike PGH's neighborhood advocacy network. While certainly not a random sample of the greater Pittsburgh community, this sample does offer the chance to test the mechanics underlying the PTPT and roll-up the online survey submission process associated with the PTPT as well as address any early concerns that may arise in administering the online survey.

From a technical standpoint, even though these data are not explicitly representative, there is still much that can be learned from using these data, and we plan to utilize methods from the survey literature [@gelman2007struggles; @lumley2011complex] in order to derive estimates that are more representative of the city.

Stage 3: Designed Sample

The final stage would be a designed sample targeting households in the city of Pittsburgh and explicitly modeling non-response. This represents the final stage of possible data collection for the PTPT, the highest quality data and, the most difficult data to acquire for this project.

Substantive Questions

Propensity to transit

The primary estimand of interest and key indicator by which progress towards the goal of non-car dependence is measured is the probability an individual in the Pittsburgh area uses transit - a non-single passenger vehicle mode of travel - to complete a within-city (or potentially within-county) trip. Estimates for this value are typically distinguished by the type of trip, e.g. commute. According to US census data compiled by the American League of Bicyclists the proportion of pittsburgh residents cycling to commute is approximately 2% [@mcleod_2016; @mcleod_2017]. It is much more difficult to get estimates of the number of individuals using some combination of transit and walking or solely walking to commute let alone any estimates of individual's mode of transit taken for other vital trips such as getting groceries, or attending social/cultural events.

Transit factors

In addition to estimating the propensity to transit we are also interested in what factors influence this quantity. This include physical factors, such as the bike lanes, bus ways and other infrastructure that define spaces where cyclists and bus passengers are able to feel safest or obtain their most efficient travel, as well as social factors, such as the knowledge of "how to" ride a bike, or social peers that also ride bikes. Understanding these factors can be critical for motivating and guiding a transition away from a car-dominant transit landscape to a more resilient multi-modal landscape.

Data Sources

| | Organization | Dataset | |---|--------------|-----------------------------| | | WPRDC | Neighborhood Boundaries | | | WPRDC | Street Coordinates | | | WPRDC | Bike Lane Coordinates | | | WPRDC | Zoning Boundaries | | | WPRDC | Land Parcel Classifications | | | NASA | Elevation Data | | | OSM | Grocery Store Locations | | | OSM | Trip Routes/Times |