knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This is practically the same code you can find on this blog post of mine:
https://www.brodrigues.co/blog/2018-11-14-luxairport/
but with some minor updates to reflect the current state of the {tidyverse}
packages as well
as logging using {chronicler}
.
Let's first load the required packages, and the avia
dataset included in the {chronicler}
package:
library(chronicler) library(dplyr) library(tidyr) library(stringr) library(lubridate) # Ensure chronicler version of `pick()` is being used pick <- chronicler::pick data("avia")
Now I need to define the needed functions for the analysis. To improve logging, I add the dim()
function as the .g
argument of each function below. This will make it possible to see how the
dimensions of the data change inside the pipeline:
# Define required functions # You can use `record_many()` to avoid having to write everything r_select <- record(select, .g = dim) r_pivot_longer <- record(pivot_longer, .g = dim) r_filter <- record(filter, .g = dim) r_mutate <- record(mutate, .g = dim) r_separate <- record(separate, .g = dim) r_group_by <- record(group_by, .g = dim) r_summarise <- record(summarise, .g = dim)
avia_clean <- avia %>% r_select(1, contains("20")) %>% # select the first column and every column starting with 20 bind_record(r_pivot_longer, -starts_with("unit"), names_to = "date", values_to = "passengers") %>% bind_record(r_separate, col = 1, into = c("unit", "tra_meas", "air_pr\\time"), sep = ",")
Let’s focus on monthly data:
avia_monthly <- avia_clean %>% bind_record(r_filter, tra_meas == "PAS_BRD_ARR", !is.na(passengers), str_detect(date, "M")) %>% bind_record(r_mutate, date = paste0(date, "01"), date = ymd(date)) %>% bind_record(r_select, destination = "air_pr\\time", date, passengers)
avia_monthly
is an object of class chronicle
, but in essence, it is just a list, with its own
print method:
avia_monthly
Now that the data is clean, we can read the log:
read_log(avia_monthly)
This is especially useful if the object avia_monthly
gets saved using saveRDS()
. People that
then read this object, can read the log to know what happened and reproduce the steps if necessary.
Let's take a look at the final data set:
avia_monthly %>% pick("value")
It is also possible to take a look at the underlying .log_df
object that contains more details,
and see the output of the .g
argument (which was defined in the beginning as the dim()
function):
check_g(avia_monthly)
After select()
the data has 509 rows and 231 columns, after the call to pivot_longer()
117070 rows and 3 columns, separate()
adds two columns, after filter()
only 7632 rows
remain (mutate()
does not change the dimensions) and then select()
is used to remove 2 columns.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.