knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) if (rlang::is_installed("partykit") && rlang::is_installed("lightgbm") && rlang::is_installed("modeldata")) { run <- TRUE } else { run <- FALSE } knitr::opts_chunk$set( eval = run )
The goal of bonsai is to provide bindings for additional tree-based model engines for use with the {parsnip} package.
If you're not familiar with parsnip, you can read more about the package on it's website.
To get started, load bonsai with:
library(bonsai)
To illustrate how to use the package, we'll fit some models to a dataset containing measurements on 3 different species of penguins. Loading in that data and checking it out:
library(modeldata) data(penguins) str(penguins)
Specifically, making use of our knowledge of which island that they live on and measurements on their flipper length, we will predict their species using a decision tree. We'll first do so using the engine "rpart"
, which is supported with parsnip alone:
# set seed for reproducibility set.seed(1) # specify and fit model dt_mod <- decision_tree() %>% set_engine(engine = "rpart") %>% set_mode(mode = "classification") %>% fit( formula = species ~ flipper_length_mm + island, data = penguins ) dt_mod
From this output, we can see that the model generally first looks to island
to determine species, and then makes use of a mix of flipper length and island to ultimately make a species prediction.
A benefit of using parsnip and bonsai is that, to use a different implementation of decision trees, we simply change the engine argument to set_engine
; all other elements of the interface stay the same. For instance, using "partykit"
—which implements a type of decision tree called a conditional inference tree—as our backend instead:
decision_tree() %>% set_engine(engine = "partykit") %>% set_mode(mode = "classification") %>% fit( formula = species ~ flipper_length_mm + island, data = penguins )
This model, unlike the first, relies on recursive conditional inference to generate its splits. As such, we can see it generates slightly different results. Read more about this implementation of decision trees in ?details_decision_tree_partykit
.
One generalization of a decision tree is a random forest, which fits a large number of decision trees, each independently of the others. The fitted random forest model combines predictions from the individual decision trees to generate its predictions.
bonsai introduces support for random forests using the partykit
engine, which implements an algorithm called a conditional random forest. Conditional random forests are a type of random forest that uses conditional inference trees (like the one we fit above!) for its constituent decision trees.
To fit a conditional random forest with partykit, our code looks pretty similar to that which we we needed to fit a conditional inference tree. Just switch out decision_tree()
with rand_forest()
and remember to keep the engine set as "partykit"
:
rf_mod <- rand_forest() %>% set_engine(engine = "partykit") %>% set_mode(mode = "classification") %>% fit( formula = species ~ flipper_length_mm + island, data = penguins )
Read more about this implementation of random forests in ?details_rand_forest_partykit
.
Another generalization of a decision tree is a series of decision trees where each tree depends on the results of previous trees—this is called a boosted tree. bonsai implements an additional parsnip engine for this model type called lightgbm
. To make use of it, start out with a boost_tree
model spec and set engine = "lightgbm"
:
bt_mod <- boost_tree() %>% set_engine(engine = "lightgbm") %>% set_mode(mode = "classification") %>% fit( formula = species ~ flipper_length_mm + island, data = penguins ) bt_mod
Read more about this implementation of boosted trees in ?details_boost_tree_lightgbm
.
Each of these model specs and engines have several arguments and tuning parameters that affect user experience and results greatly. We recommend reading about each of these parameters and tuning them when you find them relevant for your modeling use case.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.