library(knitr) library(utilities) knitr::opts_chunk$set(collapse = TRUE, comment = NA, prompt = FALSE, echo = TRUE)
options(crayon.enabled = TRUE) knitr::knit_hooks$set(output = utilities::ansi_handler) knitr::knit_hooks$set(message = utilities::ansi_handler) # try old.hooks <- set_knit_hooks(knitr::knit_hooks) original_output_hook <- knit_hooks$get("output") knit_hooks$set(output = utilities::create_trimming_hook(original_output_hook))
options(chronicler.attach = FALSE) library(chronicler) library(repository) library(utilities) library(withr)
ML Flow
introduces the notions of runs and experiments. A run
is a single execution of an arbitrary program that, using ML Flow
's API,
registers a model along with its parametrization and a number of artifacts:
plots, printouts, data sets. An experiment is a group of runs.
chronicler
does not have an explicit notion of run or experiment
but we can always find a suitable mapping. Since a run contains
a model, let's map each model collected by chronicler
to a separate
run. Furthermore, we will examine the sequence of R commands leading
up to each of these models and extract literals used or scalar variables
referenced in those commands: these will be the parameters reported to
ML Flow
. Finally, all artifacts downstream from each model will become
ML Flow
's artifacts.
In this document ML Flow
, we mean the Python project
from Databricks. Conversely, mlflow
means the
R package
which is an interface to the actual ML Flow
system.
Once again, following the Introduction vignette, we will work with the
iris_model()
example (see ?chronicler::iris_model
for more details).
chronicler:::attach_to_repository(iris_model())
The chronicler::find_experiments()
function searches for all artifacts
that match the definition of a model and presents them together with their
parametrization and downstream artifacts (outcomes). In our example there
is one such model.
find_experiments()
As we said earlier, each experiment contains:
Now need to build a function that maps the output of find_experiments()
to concepts of ML Flow
(and API calls in the mlflow
package). But
first we need to define two helper functions.
Our first helper function extracts the name and the value of each
model parameter. To do so, it first flattens all parameters into a single
vector and then makes sure they all have names. Each unnamed parameter
receives a name "parameter_<i>"
where i
is its index in the flattened
vector
parameters_to_named_list <- function (experiment) { # flatten all parameters in this experiment run params <- unlist(lapply(experiment$path, `[[`, i = 'parameters')) # make sure all parameters have names if (is.null(names(params))) { names(params) <- paste0("parameter_", seq_along(params)) } else if (any(!nchar(names(params)))) { i <- !nchar(names(params)) names(params)[i] <- paste0("parameter_", which(i)) } params }
The second helper function iteraters over the downstream artifacts,
which are called the outcomes, and saves each of them as a file. Plots
are saved as PNGs and everything else is serialized as RDS (see ?saveRDS
).
The helper returns the path to each newly created file.
artifact_to_file <- function (artifact) { # if a plot, put in a PNG and report if (artifact_is(artifact, 'plot')) { path <- tempfile(fileext = ".png") with_png(path, plot(artifact_data(artifact))) } else { # if an R object, serialize to RDS and report path <- tempfile(fileext = ".rds") saveRDS(artifact_data(artifact), path) } stopifnot(file.exists(path)) path }
chronicler
to mlflow
Now we can finally report all experiments found in the repository to ML Flow.
Each experiment from chronicler
's world is translated into a ML Flow's
run. Naming might be a little confusing, especially that ML Flow also uses
the term expriment - for an entity that groups multiple runs. As soon as we
wrap our minds around this slightly abusive overloading of names, we can look
at the final function, register_with_mlflow
.
It begins with a call to mlflow_start_run
followed immediately with a
"destructor" that ends the ML Flow run. Then we proceed to report the three
categories of data:
parameters_to_named_list
crate
(a mandatory wrapper, see ?mlflow::crate
)mlflow
as file paths, prepared by artifact_to_file
register_with_mlflow <- function (experiment) { # start a new "run", a ML Flow grouping concept mlflow_start_run() on.exit(mlflow_end_run()) # extract parameters... params <- parameters_to_named_list(experiment) cat("Logging parameter: ", paste(names(params), '=', unlist(params), collapse = ', '), '\n') imap(params, function(value, name) mlflow_log_param(name, value)) # log the model mlflow_save_model(crate(~ stats::predict(model, .x), model = experiment$model)) # finally, log all the downstream artifacts paths <- lapply(experiment$outcomes, artifact_to_file) # report to mlflow cat("Logging", length(paths), "artifacts\n") lapply(paths, mlflow_log_artifact) }
All that is now left to do is to call register_with_mlflow
for each
experiment in the repository.
library(mlflow) mlflow_set_experiment("Exported from chronicler") invisible(lapply(find_experiments(), register_with_mlflow))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.