In sharlagelfand/opendatatoronto: Access the City of Toronto Open Data Portal

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE
)

Excel files (XLS and XLSX) are a common form of data on the City of Toronto Open Data Portal.

In cases where the file only contains one sheet, the resource is returned as a tibble. For example, this data set on TTC Ridership Analysis from 1985 to 2018:

library(opendatatoronto)

list_package_resources("https://open.toronto.ca/dataset/ttc-ridership-analysis/") %>%
  get_resource()

When the file contains multiple sheets, the resource is returned as a named list, where the names are the names of the sheets, as in the dataset on Wellbeing Toronto Demographics:

library(dplyr)

wellbeing_toronto_demographics <- list_package_resources("https://open.toronto.ca/dataset/wellbeing-toronto-demographics/") %>%
  filter(name == "wellbeing-toronto-demographics") %>%
  get_resource()

str(wellbeing_toronto_demographics, max.level = 1)

To access the relevant sheet, pull out the list element:

wellbeing_toronto_demographics[["IndicatorMetaData"]]

There are also cases where the file contains multiple sheets and it would be helpful to have them all together as a single data set. For example, the 2019 TTC Bus Delay Data:

ttc_bus_delays_2019 <- search_packages("TTC Bus Delay Data") %>%
  list_package_resources() %>%
  filter(name == "ttc-bus-delay-data-2019") %>%
  get_resource()

The result of is a list with an element for every month of data, each of which is a tibble:

str(ttc_bus_delays_2019, max.level = 1)

Note that the data for for the element Apr 2019 has one more variable than the rest (11 versus 10):

sapply(ttc_bus_delays_2019, colnames)

It seems that the Apr 2019 data has gained a variable Incident ID, and that the variables Min Gap and Min Delay, present in all the other months, have been renamed to Gap and Delay, respectively.

We can rename these two variables:

ttc_bus_delays_2019[["Apr 2019"]] <- ttc_bus_delays_2019[["Apr 2019"]] %>%
  rename(`Min Gap` = Gap, `Min Delay` = Delay)

and combine all of the elements into a single tibble using dplyr::bind_rows():

ttc_bus_delays_2019_combined <- bind_rows(ttc_bus_delays_2019)

ttc_bus_delays_2019_combined

Unfortunately, it looks like the Time variable got Excel™ed, and will need some data cleaning.

For interests sake, it appears that Incident ID is a lookup ID for the type of incident -- only present in the Apr 2019 data, but interesting nonetheless!

ttc_bus_delays_2019_combined %>%
  filter(!is.na(`Incident ID`)) %>%
  distinct(`Incident ID`, Incident)

sharlagelfand/opendatatoronto documentation built on April 12, 2025, 7:52 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sharlagelfand/opendatatoronto
Access the City of Toronto Open Data Portal

In sharlagelfand/opendatatoronto: Access the City of Toronto Open Data Portal

R Package Documentation

Browse R Packages

We want your feedback!

sharlagelfand/opendatatoronto Access the City of Toronto Open Data Portal

In sharlagelfand/opendatatoronto: Access the City of Toronto Open Data Portal

R Package Documentation

Browse R Packages

We want your feedback!

sharlagelfand/opendatatoronto
Access the City of Toronto Open Data Portal