Reproducible DevOps Strictly Without Magic
{muggle} is an R package to implement DevOps best practices for R data products.
{muggle} is developed at the SUB, but can be used by anyone. SUB-specific defaults are set in the {metar} package.
Data products are somewhere between one-off scripts and CRAN-bound packages. One the one hand, they are more complex than simple scripts and take longer to build. To ensure value for their creators, such data products must be reproducible and better scale to accomodate more users or developers. On the other hand, data products face fewer requirements than mainstream packages: They can have more dependencies and they need not run in different computing environments.
{muggle} addresses the needs of such often domain-specific data products by implementing these priorities:
DESCRIPTION
file.
This exposes more technical underpinnings, but makes it easier to reason about a project if things go wrong.R/
and a report can be an RMarkdown document in vignettes/
.
This adds minimal overhead, but also enforces various best practices and structures projects in a single, familiar way.Across
- local development (via RStudio Server Open Source or vscode),
- continuous integration / continuous delivery (CI/CD) scripts on GitHub Actions,
- batch jobs on in high-performance computing clusters,
- and even shiny apps or plumber web APIs,
the project will run in the exact same computing environment.
This reduces flexibility, but minimises the time wasted on "but-it-works-on-my-machine"-problems.
4. Fast Iterations
{muggle} speeds up development iterations as much as possible, using
- pre-compiled binaries from RStudio Package Manager),
- docker layer cache,
- GitHub Actions cache for dependencies and
- knitr cache for vignettes.
This creates some "GOTCHAS", but encourages agile development by quick turnarounds.
5. Only Humans git commit
.
{muggle} is designed so that human-edited sources files are under version control.
Copy-pasted boilerplate and compiled assets are avoided as much as possible (with the exception of man/
so as to not break remotes::install_github()
).
This requires a bit more discipline, but enhances reproducibility and cleans up git diff
s.
6. R Packages are for Code, not Data.
{muggle} does not ship data with a package, but only wrapper functions which either call databases or git lfs storage.
Only small or unchanging datasets can be stored inside packages.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.