GTFS Tables

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(trread)

GTFS Table Relationships

gtfs-relationship-diagram Source: Wikimedia, user -stk.

Additional tables calculated by trread

In addition to the tables described above, trread attempts to calculate the following tables when one uses read_gtfs():

Frequencies/Headways

trread prints a message regarding these tables on reading any GTFS file.

# Read in GTFS feed
# here we use a feed included in the package, but note that you can read directly from the New York City Metropolitan Transit Authority using the following URL:
# nyc <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip")
local_gtfs_path <- system.file("extdata", 
                               "google_transit_nyc_subway.zip", 
                               package = "trread")
nyc <- read_gtfs(local_gtfs_path, 
                 local=TRUE,
                 frequency=TRUE)

Example GTFS Table Joins

Route Frequencies to Routes

For example, joining the standard routes table, with the 'route_shortname' variable to routes_frequencies_df.

routes_df_frequencies <- nyc$routes_df %>% 
  inner_join(nyc$routes_frequency_df, by = "route_id") %>% 
          select(route_long_name,
                 median_headways, 
                 mean_headways, 
                 st_dev_headways, 
                 stop_count)
head(routes_df_frequencies)

Headways at Stops for a Route

A more complex example of cross-table joins is to pull the stops and their headways for a given route.

This simple question is a great way to begin to understand a lot about the GTFS data model.

First, we'll need to find a 'service_id', which will tell us which stops a route passes through on a given day of the week and year.

When calculating frequencies, trread tries to guess which service_id is representative of a standard weekday by walking through a set of steps. Below we'll just do some of this manually.

First, lets look at the calendar_df.

head(sample_n(nyc$calendar_df,10))

Then we'll pull a random route_id and set of service_ids that run on Mondays.

select_service_id <- filter(nyc$calendar_df,monday==1) %>% pull(service_id)
select_route_id <- sample_n(nyc$routes_df,1) %>% pull(route_id)

Now we'll filter down through the data model to just stops for that route and service_ids.

some_trips <- nyc$trips_df %>%
  filter(route_id %in% select_route_id & service_id %in% select_service_id)

some_stop_times <- nyc$stop_times_df %>% 
  filter(trip_id %in% some_trips$trip_id) 

some_stops <- nyc$stops_df %>%
  filter(stop_id %in% some_stop_times$stop_id)


Try the trread package in your browser

Any scripts or data that you put into this service are public.

trread documentation built on May 1, 2019, 10:14 p.m.