vetiver_compute_metrics: Aggregate model metrics over time for monitoring
In tidymodels/vetiver: Version, Share, Deploy, and Monitor Models

vetiver_compute_metrics

R Documentation

Aggregate model metrics over time for monitoring

Description

These three functions can be used for model monitoring (such as in a monitoring dashboard):

vetiver_compute_metrics() computes metrics (such as accuracy for a classification model or RMSE for a regression model) at a chosen time aggregation period
vetiver_pin_metrics() updates an existing pin storing model metrics over time
vetiver_plot_metrics() creates a plot of metrics over time

Usage

vetiver_compute_metrics(
  data,
  date_var,
  period,
  truth,
  estimate,
  ...,
  metric_set = yardstick::metrics,
  every = 1L,
  origin = NULL,
  before = 0L,
  after = 0L,
  complete = FALSE
)

Arguments

`data`	A `data.frame` containing the columns specified by `truth`, `estimate`, and `...`.
`date_var`	The column in `data` containing dates or date-times for monitoring, to be aggregated with `.period`
`period`	`⁠[character(1)]⁠` A string defining the period to group by. Valid inputs can be roughly broken into: `"year"`, `"quarter"`, `"month"`, `"week"`, `"day"` `"hour"`, `"minute"`, `"second"`, `"millisecond"` `"yweek"`, `"mweek"` `"yday"`, `"mday"`
`truth`	The column identifier for the true results (that is `numeric` or `factor`). This should be an unquoted column name although this argument is passed by expression and support quasiquotation (you can unquote column names).
`estimate`	The column identifier for the predicted results (that is also `numeric` or `factor`). As with `truth` this can be specified different ways but the primary method is to use an unquoted variable name.
`...`	A set of unquoted column names or one or more `dplyr` selector functions to choose which variables contain the class probabilities. If `truth` is binary, only 1 column should be selected, and it should correspond to the value of `event_level`. Otherwise, there should be as many columns as factor levels of `truth` and the ordering of the columns should be the same as the factor levels of `truth`.
`metric_set`	A `yardstick::metric_set()` function for computing metrics. Defaults to `yardstick::metrics()`.
`every`	`⁠[positive integer(1)]⁠` The number of periods to group together. For example, if the period was set to `"year"` with an every value of `2`, then the years 1970 and 1971 would be placed in the same group.
`origin`	`⁠[Date(1) / POSIXct(1) / POSIXlt(1) / NULL]⁠` The reference date time value. The default when left as `NULL` is the epoch time of `⁠1970-01-01 00:00:00⁠`, in the time zone of the index. This is generally used to define the anchor time to count from, which is relevant when the every value is `⁠> 1⁠`.
`before`, `after`	`⁠[integer(1) / Inf]⁠` The number of values before or after the current element to include in the sliding window. Set to `Inf` to select all elements before or after the current element. Negative values are allowed, which allows you to "look forward" from the current element if used as the `.before` value, or "look backwards" if used as `.after`.
`complete`	`⁠[logical(1)]⁠` Should the function be evaluated on complete windows only? If `FALSE`, the default, then partial computations will be allowed.

Details

For arguments used more than once in your monitoring dashboard, such as date_var, consider using R Markdown parameters to reduce repetition and/or errors.

Value

A dataframe of metrics.

Examples


library(dplyr)
library(parsnip)
data(Chicago, package = "modeldata")
Chicago <- Chicago %>% select(ridership, date, all_of(stations))
training_data <- Chicago %>% filter(date < "2009-01-01")
testing_data <- Chicago %>% filter(date >= "2009-01-01", date < "2011-01-01")
monitoring <- Chicago %>% filter(date >= "2011-01-01", date < "2012-12-31")
lm_fit <- linear_reg() %>% fit(ridership ~ ., data = training_data)

library(pins)
b <- board_temp()

original_metrics <-
    augment(lm_fit, new_data = testing_data) %>%
    vetiver_compute_metrics(date, "week", ridership, .pred, every = 4L)

new_metrics <-
    augment(lm_fit, new_data = monitoring) %>%
    vetiver_compute_metrics(date, "week", ridership, .pred, every = 4L)

tidymodels/vetiver documentation built on Oct. 15, 2024, 4:16 p.m.