README.md

muggle

Main Codecov test coverage CRAN status Lifecycle: experimental muggle-buildtime Docker Pulls muggle-runtime Docker Pulls

Overview

Reproducible DevOps Strictly Without Magic

{muggle} is an R package to implement DevOps best practices for R data products.

{muggle} is developed at the SUB, but can be used by anyone. SUB-specific defaults are set in the {metar} package.

Data products are somewhere between one-off scripts and CRAN-bound packages. One the one hand, they are more complex than simple scripts and take longer to build. To ensure value for their creators, such data products must be reproducible and better scale to accomodate more users or developers. On the other hand, data products face fewer requirements than mainstream packages: They can have more dependencies and they need not run in different computing environments.

{muggle} addresses the needs of such often domain-specific data products by implementing these priorities:

  1. No Magic (hence the name) {muggle} will never infer intent, but users have to state it explicitly. For example, dependencies are never inferred from a project source files, but have to be listed in the DESCRIPTION file. This exposes more technical underpinnings, but makes it easier to reason about a project if things go wrong.
  2. Every R Data Product is an R Package {muggle} organises all data products as an R package. For example, a shiny app can be a function in R/ and a report can be an RMarkdown document in vignettes/. This adds minimal overhead, but also enforces various best practices and structures projects in a single, familiar way.
  3. One Image Rule them All {muggle} provides one fully versioned and reproducible compute environment as a docker image, including:
  4. operating system,
  5. R version,
  6. system dependencies,
  7. and R dependencies snapshotted by date (via RStudio Package Manager).

Across - local development (via RStudio Server Open Source or vscode), - continuous integration / continuous delivery (CI/CD) scripts on GitHub Actions, - batch jobs on in high-performance computing clusters, - and even shiny apps or plumber web APIs, the project will run in the exact same computing environment. This reduces flexibility, but minimises the time wasted on "but-it-works-on-my-machine"-problems. 4. Fast Iterations {muggle} speeds up development iterations as much as possible, using - pre-compiled binaries from RStudio Package Manager), - docker layer cache, - GitHub Actions cache for dependencies and - knitr cache for vignettes. This creates some "GOTCHAS", but encourages agile development by quick turnarounds. 5. Only Humans git commit. {muggle} is designed so that human-edited sources files are under version control. Copy-pasted boilerplate and compiled assets are avoided as much as possible (with the exception of man/ so as to not break remotes::install_github()). This requires a bit more discipline, but enhances reproducibility and cleans up git diffs. 6. R Packages are for Code, not Data. {muggle} does not ship data with a package, but only wrapper functions which either call databases or git lfs storage. Only small or unchanging datasets can be stored inside packages.



subugoe/muggle documentation built on Nov. 26, 2021, 11:42 p.m.