In lorenzwalthert/touchstone: Continuous Benchmarking with Statistical Confidence Based on 'Git' Branches

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

Initialization

Start by initializing {touchstone} in your package repository with:

touchstone::use_touchstone()

This will:

create a touchstone directory in the repository root with:
config.json that defines how to run your benchmark. In particular, you can define a benchmarking_repo, that is, a repository you need to run your bench mark (use "" if you don't need an additional git repository checked out for your benchmark). This repository will be cloned into benchmarking_path. For example to benchmark {styler}, we format the {here} package with style_pkg() that is not part of the {styler} repository, but with the below config, will be located at touchstone/sources/here during benchmarking. The code you want to benchmark comes from the benchmarked repository, which in our case is the root from where you call use_touchstone() and hence does not need to be defined explicitly.

    { 
      "benchmarking_repo": "lorenzwalthert/here", 
      "benchmarking_ref": "ca9c8e69c727def88d8ba1c8b85b0e0bcea87b3f", 
      "benchmarking_path": "touchstone/sources/here", 
      "os": "ubuntu-18.04", 
      "r": "4.0.0", 
      "rspm": "https://packagemanager.rstudio.com/all/__linux__/bionic/291" 
    }

script.R, the script that runs the benchmark, also called the touchstone_script.
header.R, the script containing the default PR comment header, see ?touchstone::pr_comment.
footer.R, the script containing the default PR comment footer, see ?touchstone::pr_comment.
add the touchstone directory to .Rbuildignore.
write the workflow files^[Note that these files must be committed to the default branch before {touchstone} continuous benchmarking will be triggered for new PRs.] you need to invoke touchstone in new pull requests into .github/workflows/.

Now you just need to modify the touchstone script touchstone/script.R to run the benchmarks you are interested in.

Workflow overview

While you eventually want to run your benchmarks on GitHub Actions for every pull request, it is handy to develop interactively and locally first to avoid long feedback cycles with trivial errors. The below diagram shows these two different workflows and what they entail.

knitr::include_graphics("../man/figures/workflow-visualization.png")

Note that there are two GitHub Action workflow files in .github/workflows/ due to security reasons.

In the remainder, we'll explain how to adapt the touchstone script, which is the most important script you need to customize.

Adapting the touchstone script

The file consists of three parts, two of which you will need to modify to fit your needs. Within the GitHub Action {touchstone} will always run the touchstone/script.R from the HEAD branch, so you can modify or extend the benchmarks if needed to fit your current branch. Please beware that a few functions such as branches_install(), pin_assets or benchmark_run() will perform local git checkouts and therefore the git status should be clean and no other processes should run in the directory.

The touchstone script will be run with run_script() on GitHub Actions with the required environment variables GITHUB_HEAD_REF and GITHUB_BASE_REF set for various reasons explained in the help file. If you want to interactively work on your touchstone script, i.e. running it line by line, you must activate touchstone first with activate(), which sets environment variables, R options and library paths.

Setup

We first install the different package versions in separate libraries with branches_install(), this is mandatory for any script.R. If you want to access some directory or file which only exists on one of the branches you can use pin_assets(..., ref = "your-branch-name") to make them available across branches. You can of course also run arbitrary code to prepare for the benchmarks.

touchstone::branches_install() # installs branches to benchmark
touchstone::pin_assets("data/all.Rdata", "inst/scripts") # pin files and directories

load(path_pinned_asset("all.Rdata")) # perform other setup you need for the benchmarks
source(path_pinned_asset("scripts/prepare.R"))
data_clean <- prepare_data(data_all)

Benchmarks

Run any number of benchmarks, with one named benchmark per call of benchmark_run(). As benchmarks are run in a subprocess, setup (like setting a seed) needs to be passed with expr_before_benchmark. You can pass a single function call or a longer block of code wrapped in "{ }". The expressions are captured with rlang so you can use quasiquotation.

# benchmark a function call from your package
touchstone::benchmark_run(
  random_test = yourpkg::fun(data_clean),
  n = 2
)

# Optionally run setup code
touchstone::benchmark_run(
  expr_before_benchmark = {
    library(yourpkg)
    set.seed(42)
  },
  test_with_seed = fun(data_clean),
  n = 2
)

Analyze

This part is just one call to benchmarks_analyze()which analyses the results and prepares the PR comment.

# create artifacts used downstream in the GitHub Action
touchstone::benchmarks_analyze()

Running the script

After you have committed and pushed the workflow files to your default branch the benchmarks will be run on new pull requests and on each commit while that pull request is open.

Note that various functions such as branches_install() pin_assets() and benchmark_run() will check out different branches and should therefore be the only process running in the directory and with a clean git status.

The Results

When the GitHub Action successfully completes, the main result you will see is the comment posted on the pull request, it will look something like this:

You can change the header and footer, see ?touchstone::pr_comment. The main body is the list of benchmarks and their results. First the mean elapsed time of all iterations of that benchmark (BASE $\rightarrow$ HEAD) and a 95% confidence interval.

To make it easier to spot benchmarks that need closer attention, we prefix the benchmarks with emojis indicating if there was a statistically significant change (see inference) in the timing or not. In this example the HEAD version of func1 is significantly faster, func3 is slower than BASE while there is no significant change for func2.

The GitHub Action will also upload this text and plots of the timings as artifacts.

lorenzwalthert/touchstone documentation built on Nov. 8, 2024, 2:32 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lorenzwalthert/touchstone
Continuous Benchmarking with Statistical Confidence Based on 'Git' Branches

In lorenzwalthert/touchstone: Continuous Benchmarking with Statistical Confidence Based on 'Git' Branches

Initialization

Workflow overview

Adapting the touchstone script

Setup

Benchmarks

Analyze

Running the script

The Results

R Package Documentation

Browse R Packages

We want your feedback!

lorenzwalthert/touchstone Continuous Benchmarking with Statistical Confidence Based on 'Git' Branches

In lorenzwalthert/touchstone: Continuous Benchmarking with Statistical Confidence Based on 'Git' Branches

Initialization

Workflow overview

Adapting the touchstone script

Setup

Benchmarks

Analyze

Running the script

The Results

R Package Documentation

Browse R Packages

We want your feedback!

lorenzwalthert/touchstone
Continuous Benchmarking with Statistical Confidence Based on 'Git' Branches