tidypredict

R-CMD-check CRAN_Status_Badge Codecov test coverage

if (!rlang::is_installed("randomForest")) {
  knitr::opts_chunk$set(
    eval = FALSE
  )
}
library(dplyr)
library(tidypredict)
library(randomForest)

The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. In other words, it is able to parse a model such as this one:

model <- lm(mpg ~ wt + cyl, data = mtcars)

tidypredict can return a SQL statement that is ready to run inside the database. Because it uses dplyr's database interface, it works with several databases back-ends, such as MS SQL:

tidypredict_sql(model, dbplyr::simulate_mssql())

Installation

Install tidypredict from CRAN using:

install.packages("tidypredict")

Or install the development version using devtools as follows:

install.packages("remotes")
remotes::install_github("tidymodels/tidypredict")

Functions

tidypredict has only a few functions, and it is not expected that number to grow much. The main focus at this time is to add more models to support.

| Function | Description |-----------------------------|--------------------------------------------------------------------------------| |tidypredict_fit() | Returns an R formula that calculates the prediction | |tidypredict_sql() | Returns a SQL query based on the formula from tidypredict_fit() | |tidypredict_to_column() | Adds a new column using the formula from tidypredict_fit() | |tidypredict_test() | Tests tidyverse predictions against the model's native predict() function | |tidypredict_interval() | Same as tidypredict_fit() but for intervals (only works with lm and glm) | |tidypredict_sql_interval() | Same as tidypredict_sql() but for intervals (only works with lm and glm) | |parse_model() | Creates a list spec based on the R model | |as_parsed_model() | Prepares an object to be recognized as a parsed model |

How it works

Instead of translating directly to a SQL statement, tidypredict creates an R formula. That formula can then be used inside dplyr. The overall workflow would be as illustrated in the image above, and described here:

  1. Fit the model using a base R model, or one from the packages listed in Supported Models
  2. tidypredict reads model, and creates a list object with the necessary components to run predictions
  3. tidypredict builds an R formula based on the list object
  4. dplyr evaluates the formula created by tidypredict
  5. dplyr translates the formula into a SQL statement, or any other interfaces.
  6. The database executes the SQL statement(s) created by dplyr

Parsed model spec

tidypredict writes and reads a spec based on a model. Instead of simply writing the R formula directly, splitting the spec from the formula adds the following capabilities:

  1. No more saving models as .rds - Specifically for cases when the model needs to be used for predictions in a Shiny app.
  2. Beyond R models - Technically, anything that can write a proper spec, can be read into tidypredict. It also means, that the parsed model spec can become a good alternative to using PMML.

Supported models

The following models are supported by tidypredict:

parsnip

tidypredict supports models fitted via the parsnip interface. The ones confirmed currently work in tidypredict are:

broom

The tidy() function from broom works with linear models parsed via tidypredict

pm <- parse_model(lm(wt ~ ., mtcars))
tidy(pm)

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.



tidymodels/tidypredict documentation built on Jan. 19, 2024, 1:14 p.m.