knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6, fig.height = 4
)

You can load the package using the following code:

library(olympicmarathon)
library(tidyverse)

The Data

The key feature of this package is the data provided. The data comes in an .rda file called olympic_marathon; when loaded, this is a tibble and data frame with 9 columns and 1956 rows. Let's take a look at the data:

olympic_marathon %>%
  head(10) %>%
  knitr::kable(caption = "Glimpse of olympic_marathon Data")

You can review the documentation for this data, including variable descriptions, using the code:

?olympic_marathon

This data frame contains data from all Olympic marathons, women and men, from 1984-2020, sourced from World Athletics and OlympianDatabase.com.

There are a few important things to note about this data. First, the Olympics held in 2021 in Tokyo were officially named "the 2020 Olympic Games" and are marked as such in the data. Second, note that the nationality column uses three-letter country codes from the International Olympic Committee. Please see the full list on Wikipedia. Additionally, those competitors who did not finish or were disqualified do not have a rank: their rank is marked NA. Lastly, those competitors who did not earn a medal have medal marked NA.

When looking at results over time in visualizations or summary statistics, it's important to filter out those competitors who did not receive a finishing time. Those individuals have a result marked "DNS" (did not start), "DNF" (did not finish), and "DQ" (disqualified). The code to filter out those individuals is below:

olympic_marathon %>%
  filter(!(result %in% c("DQ", "DNS", "DNF")))

Once these individuals are filtered out, we recommend the as_hms function from the hms package to convert the result column to a time format. This function will throw an error if those competitors who did not finish are not filtered out.

olympic_marathon %>%
  filter(!(result %in% c("DQ", "DNS", "DNF"))) %>%
  mutate(result = hms::as_hms(result))

Visualize the Data

While we encourage exploring the data on your own, we provided two built-in functions to visualize the data.

two_country_viz()

The two_country_viz() function allows a user to quickly create a visualization of the data from two countries of choice. The function takes two inputs, each one being the appropriate three-letter country code for the desired countries. A ggplot object is created which shows all of the finishing times of competitors from the selected countries, plotted against the corresponding competition years.

To see the finishing times of Italy and Japan, the following code will do the job:

two_country_viz("ITA", "JPN")

Since this is a ggplot object, the user is able to edit many aspects of the visualization. For example, to edit the title of the plot, this code is useful:

two_country_viz("ITA", "JPN") +
  labs(title = "Results vs. Year for Italy and Japan")

Or we could add lines to better visualize the finishing times for one gender within one country:

two_country_viz("ITA", "JPN") +
  geom_line()

times_over_time()

times_over_time() is a simple function provided by the package which takes no arguments. It returns a ggplot figure which displays every complete finishing time over the years by gender.

times_over_time()

To turn this figure into an interactive plot, we recommend wrapping it in ggplotly() from the plotly package. Additional ggplot customization is available in a similar manner to what is described with two_country_viz() above.

Further Resources

To load data from years prior to 1984, we recommend using the gettxt() function from the htm2txt package to gather data from the tables provided by the website OlympianDatabase.com. Women were not permitted to run the Olympic Marathon until 1984, so there will be no women's results before then.



mflesaker/olympicmarathon documentation built on May 4, 2022, 5:36 p.m.