In tidymodels/vetiver: Version, Share, Deploy, and Monitor Models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = requireNamespace("parsnip", quietly = TRUE) && requireNamespace("recipes", quietly = TRUE) && requireNamespace("workflows", quietly = TRUE)
)

The goal of vetiver is to provide fluent tooling for MLOps tasks for your trained model including:

versioning
storing
sharing
deploying

For more extensive documentation, visit https://vetiver.rstudio.com/.

Create a `vetiver_model()`

The vetiver package is extensible, with generics that can support many kinds of models. For this example, let's consider one kind of model supported by vetiver, a tidymodels workflow that encompasses both feature engineering and model estimation.

library(parsnip)
library(recipes)
library(workflows)
data(bivariate, package = "modeldata")
bivariate_train

biv_rec <-
  recipe(Class ~ ., data = bivariate_train) %>%
  step_BoxCox(all_predictors())%>%
  step_normalize(all_predictors())

svm_spec <-
  svm_linear(mode = "classification") %>%
  set_engine("LiblineaR")

svm_fit <- 
  workflow(biv_rec, svm_spec) %>%
  fit(sample_frac(bivariate_train, 0.7))

This svm_fit object is a fitted workflow, with both feature engineering and model parameters estimated using the training data bivariate_train. We can create a vetiver_model() from this trained model; a vetiver_model() collects the information needed to store, version, and deploy a trained model.

library(vetiver)
v <- vetiver_model(svm_fit, "biv_svm")
v

Think of this vetiver_model() as a deployable model object.

Store and version your model

You can store and version your model by choosing a pins "board" for it, including a local folder, Posit Connect, Amazon S3, and more. Most pins boards have versioning turned on by default, but we can turn it on explicitly for our temporary demo board. When we write the vetiver_model() to our board, the binary model object is stored on our board together with necessary metadata, like the packages needed to make a prediction and the model's input data prototype for checking new data at prediction time.

library(pins)
model_board <- board_temp(versioned = TRUE)
model_board %>% vetiver_pin_write(v)

Let's train our model again with a new version of the dataset and write it once more to our board.

svm_fit <- 
  workflow(biv_rec, svm_spec) %>%
  fit(sample_frac(bivariate_train, 0.7))

v <- vetiver_model(svm_fit, "biv_svm")

model_board %>% vetiver_pin_write(v)

Both versions are stored, and we have access to both.

model_board %>% pin_versions("biv_svm")

The primary purpose of pins is to make it easy to share data artifacts, so depending on the board you choose, your pinned vetiver_model() can be shareable with your collaborators.

Deploy your model

You can deploy your model by creating a Plumber router, and adding a POST endpoint for making predictions.

library(plumber)
pr() %>%
  vetiver_api(v)

To start a server using this object, pipe (%>%) to pr_run(port = 8088) or your port of choice. This allows you to interact with your vetiver API locally and debug it. Plumber APIs such as these can be hosted in a variety of ways. You can use the function vetiver_write_plumber() to create a ready-to-go plumber.R file that is especially suited for Posit Connect.

vetiver_write_plumber(model_board, "biv_svm")

tmp <- tempfile()
vetiver_write_plumber(model_board, "biv_svm", file = tmp)
cat(readr::read_lines(tmp), sep = "\n")

In a real-world situation, you would see something like b <- board_connect() or b <- board_s3() here instead of our temporary demo board. Notice that the deployment is strongly linked to a specific version of the pinned model; if you pin another version of the model after you deploy your model, your deployed model will not be affected.

Predict from your model endpoint

A model deployed via vetiver can be treated as a special vetiver_endpoint() object.

library(vetiver)
endpoint <- vetiver_endpoint("http://127.0.0.1:8088/predict")
endpoint

If such a deployed model endpoint is running via one R process (either remotely on a server or locally, perhaps via a background job in the RStudio IDE), you can make predictions with that deployed model and new data in another, separate R process.

data(bivariate, package = "modeldata")
predict(endpoint, bivariate_test)
#> # A tibble: 710 × 1
#>    .pred_class
#>    <chr>      
#>  1 One        
#>  2 Two        
#>  3 One        
#>  4 Two        
#>  5 Two        
#>  6 One        
#>  7 Two        
#>  8 Two        
#>  9 Two        
#> 10 One        
#> # … with 700 more rows

Being able to predict() on a vetiver model endpoint takes advantage of the model's input data prototype and other metadata that is stored with the model.

tidymodels/vetiver documentation built on Oct. 15, 2024, 4:16 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com