Random Forest
In tidypredict: Run Predictions Inside the Database

if (!rlang::is_installed("randomForest")) {
  knitr::opts_chunk$set(
    eval = FALSE
  )
}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dplyr)
library(tidypredict)
library(randomForest)
library(parsnip)
set.seed(100)

| Function |Works| |---------------------------------------------------------------|-----| |tidypredict_fit(), tidypredict_sql(), parse_model() | ✔ | |tidypredict_to_column() | ✗ | |tidypredict_test() | ✗ | |tidypredict_interval(), tidypredict_sql_interval() | ✗ | |parsnip | ✔ |

How it works

Here is a simple randomForest() model using the mtcars dataset:

library(dplyr)
library(tidypredict)
library(randomForest)

model <- randomForest(mpg ~ ., data = mtcars, ntree = 5, proximity = TRUE)

Under the hood

The parser is based on the output from the randomForest::getTree() function. It will return as many decision paths as there are non-NA rows in the prediction field.

getTree(model, labelVar = TRUE) %>%
  head()

The output from parse_model() is transformed into a dplyr, a.k.a Tidy Eval, formula. Each decision tree becomes one dplyr::case_when() statement, which are then combined.

tidypredict_fit(model)

From there, the Tidy Eval formula can be used anywhere where it can be operated. tidypredict provides three paths:

Use directly inside dplyr, mutate(iris, !! tidypredict_fit(model))
Use tidypredict_to_column(model) to a piped command set
Use tidypredict_to_sql(model) to retrieve the SQL statement

parsnip

tidypredict also supports randomForest model objects fitted via the parsnip package.

library(parsnip)

parsnip_model <- rand_forest(mode = "regression", trees = 5) %>%
  set_engine("randomForest") %>%
  fit(mpg ~ ., data = mtcars)

tidypredict_fit(parsnip_model)