Outline of my 2022 SiRAcon 2022 presentation, "Making R work for you (with automation!)".

Questions/TODO

library(siracon2022)
library(readr)
library(purrr)
library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(scales)
library(vistime)
library(jbplot)

Framework

Use the DORA Research Program to frame the story of how I learned R and software engineering by implementing the DORA DevOps technical practices:

https://www.devops-research.com/research.html

Create a timeline of my journey using vistime or timevis.

Outline

Use GitHub data to show how the DORA metrics changed over time as I developed and used rdev.

git log

Idea: use gert::git_log() tables across all my public and personal (private) R repositories over time to create an annotated timeline and visualization of my work, and implementation of the DORA technical practices.

Import git logs from my repositories:

# gitlogs is now included in siracon2022 for reproducibility, see data-raw/gitlogs and ?gitlogs

gitlogs_tz <- tz(gitlogs$time)

Filter logs by repository, adding cutoff dates when active development ended for timeline visualization. Drop commits past April 30 to remove partial months.

filtered_gitlogs <- gitlogs |>
  # set filter to midnight after last relevant commit, use same timezone as gitlogs
  filter(!(repo == "rstudio-training" & time > ymd_h("2020-12-28 0", tz = gitlogs_tz))) |>
  filter(!(repo == "software-resilience" & time > ymd_h("2021-02-22 0", tz = gitlogs_tz))) |>
  filter(!(repo == "rtraining" & time > ymd_h("2021-10-08 0", tz = gitlogs_tz))) |>
  filter(!(repo == "workshop7" & time > ymd_h("2021-12-07 0", tz = gitlogs_tz))) |>
  filter(!(repo == "jbplot" & time > ymd_h("2022-02-07 0", tz = gitlogs_tz))) |>
  # while this is now redundant (here and elsewhere), keeping it for clarity
  filter(time < ymd_h("2022-05-01 0", tz = gitlogs_tz)) |>
  # oldest first
  mutate(repo = factor(repo, levels = c(
    "rstudio-training", "software-resilience", "rtraining", "rdev", "workshop7", "jbplot",
    "siracon2022"
  )))

Plot monthly commits by repository.

filtered_gitlogs |>
  mutate(time = floor_date(time, unit = "month")) |>
  group_by(time, repo) |>
  summarize(commits = n(), .groups = "drop") |>
  ggplot(aes(x = time, y = commits, color = repo)) +
  geom_point() +
  geom_line() +
  labs(title = "Monthly commits by repository") +
  labs(x = "", y = "", color = "repository") +
  theme_quo()

ggsave("rendered/monthly-commits-repo.png", width = 16 * 0.6, height = 9 * 0.6, bg = "white")

High resolution plot

Timeline

Create a timeline using groups showing the history of the repositories:

repo_timeline <- filtered_gitlogs |>
  select(repo, time) |>
  mutate(time = floor_date(time, unit = "day")) |>
  group_by(repo) |>
  summarize(start = min(time), end = max(time)) |>
  arrange(start) |>
  mutate(
    group = case_when(
      grepl("training", repo, fixed = TRUE) ~ "training",
      repo %in% c("rdev", "jbplot") ~ "development",
      TRUE ~ "notebooks"
    ),
    color = hue_pal()(7)[row_number()]
  )

repo_timeline
# TODO: gg_vistime doesn't render well when using scale_color_viridis_d()
#   issue: https://github.com/shosaco/vistime/issues/30
gg_vistime(repo_timeline, col.event = "repo", title = "R Development Timeline") +
  theme_quo()

ggsave("rendered/repo-timeline.png", width = 16 * 0.6, height = 9 * 0.6, bg = "white")

High resolution plot

Key events

Plot key events on a timevis() timeline. Full page version.

key_events <- read_csv("data/key-events.csv", col_types = cols(
  id = col_integer(),
  start = col_date(format = ""),
  end = col_date(format = ""),
  content = col_character(),
  group = col_integer(),
  group_content = col_character(),
  intro = col_logical(),
  milestone = col_logical()
))

dora_groups <- key_events |>
  select(id = group, content = group_content) |>
  unique() |>
  arrange(id)

render_timevis(key_events, groups = dora_groups, file = "rendered/key-events.html", showZoom = TRUE)

2020-09-08: Starting out, rstudio-training, renv

2020-09-11: Published "Working with R"

2020-09-30: (Aside) First bug discovered, https://github.com/rstudio/renv/issues/547 !

2020-10-06: setup-r script

2020-12-02: Adoption of styler and lintr

2020-12-27: Migration to rtraining package

2020-12-29: build-site script

2020-12-30: First release: rtraining 0.0.1

2020-12-30: GitHub Actions

2020-12-30: lint_all()

2020-12-30: style_all()

2020-12-31: Switch GitHub Actions to lint_all()

2021-01-01: ci(), check_renv()

2021-01-01: Migration to rdev package

2021-01-02: Multi-platform R CMD check

2021-01-03: First version of build_analysis_site()

2021-01-09: Analysis Package Layout

2021-01-12: Native R version of build_analysis_site()

2021-01-16: Migrated build_analysis_site() from rtraining to rdev

2021-09-29: Formal R Analysis Package Layout, Documented release process

2021-12-04: Documented package creation process

2021-12-23: theme_quo(): a personalized theme to visually identify my ggplots.

2022-01-01: Automate package configuration with use_analysis_package()

2022-01-10: Create package automation (rdev 0.7.0)

2022-01-10: Automate notebook listings in README

library(rdev)
library(fs)
library(dplyr)
library(purrr)

notebooks <- dir_ls("analysis", glob = "*.Rmd") |>
  map_dfr(rmd_metadata) |>
  mutate(bullet = paste0("- [", title, "](", url, ") (", date, "): ", description)) |>
  pull(bullet)

writeLines(notebooks)

2022-01-17: Release automation (rdev 0.8.0)

2022-01-19: More workflow automation

2022-01-21 - 2022-02-06: adding test coverage

(Show plot of increasing code coverage from codecov.io)

2022-01-24: write_eval() is a really bad idea:

#' Write and evaluate an expression
#'
#' `write_eval(string)` is a simple wrapper that prints `string` to the console using
#'   [`writeLines()`][base::writeLines], then executes the expression using [`parse()`][base::parse]
#'   and [`eval()`][base::eval].
#'
#' @param string An expression to be printed to the console and evaluated
#'
#' @return The return value from the evaluated expression
#'
#' @examples
#' write_eval("pi")
#'
#' write_eval("exp(1)")
#' @export
write_eval <- function(string) {
  if (!is.character(string)) stop("not a character vector")
  if (string == "") stop("nothing to evaluate")
  writeLines(string)
  eval(parse(text = string))
}

2022-01-30: Manual test script for new package setup

2022-02-02: Added local_temppkg() test helper function

2022-02-06: rdev 1.0.0 !

2022-02-06 - Today: Continuous Improvement

Releases

Get releases from GitHub using siracon2022::gh_releases():

# cache results
if (!exists("releases")) {
  repos <- c("rtraining", "rdev", "workshop7", "jbplot", "siracon2022")
  repos <- setNames(repos, repos)
  releases <- map_dfr(repos, gh_releases, "jabenninghoff", .id = "repo") |>
    arrange(time)
}

Filter releases past April 30 to remove partial months.

filtered_releases <- releases |>
  mutate(time = with_tz(time, tzone = gitlogs_tz)) |>
  filter(time < ymd_h("2022-05-01 0", tz = gitlogs_tz))

Plot releases over time: total GitHub releases per period (for all repositories) to show changes in release frequency. The dotted line marks the implementation of release automation.

monthly_releases <- filtered_releases |>
  mutate(time = floor_date(time, unit = "month")) |>
  group_by(time) |>
  summarize(releases = n(), .groups = "drop") |>
  add_row(time = ymd("2020-11-01"), releases = 0) |>
  add_row(time = ymd("2020-10-01"), releases = 0) |>
  add_row(time = ymd("2020-09-01"), releases = 0) |>
  arrange(time)

monthly_releases |>
  ggplot(aes(x = time, y = releases)) +
  geom_point() +
  geom_line() +
  geom_vline(xintercept = ymd_h("2020-12-01 0", tz = gitlogs_tz), linetype = "dotted") +
  geom_vline(xintercept = ymd_h("2022-01-01 0", tz = gitlogs_tz), linetype = "dotted") +
  coord_cartesian(ylim = c(0, NA)) +
  labs(title = "Monthly GitHub releases") +
  labs(x = "", y = "") +
  theme_quo()

ggsave("rendered/monthly-releases.png", width = 16 * 0.6, height = 9 * 0.6, bg = "white")

High resolution plot

However, the number of releases per month might just represent how much work is being done, and looks similar to the plot of all commits by month:

gitlogs |>
  filter(time < ymd_h("2022-05-01 0", tz = gitlogs_tz)) |>
  mutate(time = floor_date(time, unit = "month")) |>
  group_by(time) |>
  summarize(commits = n(), .groups = "drop") |>
  arrange(time) |>
  ggplot(aes(x = time, y = commits)) +
  geom_point() +
  geom_line() +
  geom_vline(xintercept = ymd_h("2020-12-01 0", tz = gitlogs_tz), linetype = "dotted") +
  geom_vline(xintercept = ymd_h("2022-01-01 0", tz = gitlogs_tz), linetype = "dotted") +
  coord_cartesian(ylim = c(0, NA)) +
  labs(title = "Monthly git commits") +
  labs(x = "", y = "") +
  theme_quo()

ggsave("rendered/monthly-commits.png", width = 16 * 0.6, height = 9 * 0.6, bg = "white")

High resolution plot

Also plot releases per commit, which will fall between 0 and 1. The dotted lines mark adoption of GitHub and implementation of release automation.

gitlogs |>
  filter(time < ymd_h("2022-05-01 0", tz = gitlogs_tz)) |>
  mutate(time = floor_date(time, unit = "month")) |>
  group_by(time) |>
  summarize(commits = n()) |>
  full_join(monthly_releases, by = "time") |>
  replace_na(list(commits = 0, releases = 0)) |>
  mutate(rpc = releases / commits) |>
  ggplot(aes(x = time, y = rpc)) +
  geom_point() +
  geom_line() +
  geom_vline(xintercept = ymd_h("2020-12-01 0", tz = gitlogs_tz), linetype = "dotted") +
  geom_vline(xintercept = ymd_h("2022-01-01 0", tz = gitlogs_tz), linetype = "dotted") +
  labs(title = "Monthly GitHub releases per commit") +
  labs(x = "", y = "") +
  theme_quo()

ggsave("rendered/releases-per-commit.png", width = 16 * 0.6, height = 9 * 0.6, bg = "white")

High resolution plot

Story

Use the timeline and plots to tell the story of continuous improvement. Each section filters on group 1 and the other focus area. Integrate themes into story.

  1. Introduction: background and motivation, use Event group as the talk overview. Exclude SiRAcon 2020 from future timelines. "R Development Timeline".
  2. Version Control: put everything (except artifacts) into version control for reproducibility and history.
  3. Trunk-based Development: linear development avoids code conflicts.
  4. Shift Left on Security: maintenance first ensures you get it done.
  5. Continuous Integration: build and test on each commit to catch mistakes early.
  6. Deployment Automation: automate your development workflow to spend more time writing.
  7. Code Maintainability: consistent and clean code is easier to understand.
  8. Continuous Testing: (the biggest challenge) formally specifying what you are building and how it is supposed to work defends against the dangers of hidden assumptions.
  9. Results: "Monthly commits by repository", "Monthly GitHub releases", "GitHub releases per commit". Improvement on technical practices also means less rework, less deployment pain, less burnout, and greater job satisfaction.
  10. Closing: complete key events timeline.

Full rdev package list:

Introduction

Background and motivation. Full page version.

key_events |>
  filter(group == 1) |>
  render_timevis(groups = filter(dora_groups, id == 1), file = "rendered/intro.html")

Version Control

Put everything (except artifacts) into version control for reproducibility and history. Full page version.

Use of Homebrew, and brew bundle.

Packages:

key_events |>
  filter(!intro) |>
  filter(milestone | group == 2) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 2)), "rendered/version-control.html")

Trunk-based Development

Linear development avoids code conflicts. Full page version.

key_events |>
  filter(!intro) |>
  filter(milestone | group == 3 | id == 44) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 3)), file = "rendered/trunk-based.html")

Shift Left on Security

Maintenance first ensures you get it done. Full page version.

Reference last year's talk, recording available in member's section.

Packages:

key_events |>
  filter(!intro) |>
  filter(milestone | group == 4) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 4)), file = "rendered/shift-left.html")

Continuous Integration

Build and test on each commit to catch mistakes early. Full page version.

Packages:

key_events |>
  filter(!intro) |>
  filter(milestone | group == 5) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 5)), file = "rendered/ci.html")

Deployment Automation

Automate your development workflow to spend more time writing. Full page version.

Reducing toil. Forming habits, which become repeated tasks, which become automation. If it's automated, it gets done.

Packages:

key_events |>
  filter(!intro) |>
  filter(milestone | group == 6 | id == 32) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 6)), file = "rendered/deployments.html")

Code Maintainability

Consistent and clean code is easier to understand. Full page version.

Functional programming (purrr) vs procedural programming. Functional programming is harder to learn, but safer.

R dialects: base R is for functions, tidyverse R is for notebooks.

"Clean" code: code should be written for future humans, including you!

Packages:

key_events |>
  filter(!intro) |>
  filter(milestone | group == 7 | id == 32) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 7)), file = "rendered/code-maint.html")

Continuous Testing

The biggest challenge: formally specifying what you are building and how it is supposed to work defends against the dangers of hidden assumptions. Full page version.

Packages:

Future Testing

Mutation Testing: Wikipedia

R packages:

Papers:

Formal Methods:

key_events |>
  filter(!intro) |>
  filter(milestone | group == 8 | id == 32) |>
  render_timevis(groups = filter(dora_groups, id %in% c(1, 8)), file = "rendered/testing.html")

End of (out)line.



jabenninghoff/siracon2022 documentation built on July 17, 2025, 12:08 a.m.