The outputs for these tidy functions are different since the model structures are different.

Let's look at Cubist and RuleFit first, using the Ames data, then C5.0 with a different data set.

An example using the Ames data

library(tidymodels)
library(rules)
library(rlang)
tidymodels_prefer()
options(pillar.advice = FALSE, pillar.min_title_chars = Inf)

First we will fit a Cubist model and tidy it:

library(tidymodels)
library(rules)
library(rlang)

data(ames, package = "modeldata")

ames <- ames %>% 
  mutate(Sale_Price = log10(Sale_Price)) %>% 
  select(Sale_Price, Longitude, Latitude, Central_Air)

cb_fit <-
  cubist_rules(committees = 10) %>%
  set_engine("Cubist") %>%
  fit(Sale_Price ~ ., data = ames)

cb_res <- tidy(cb_fit)

cb_res

Since Cubist fits linear regressions within the data from each rule, the coefficients are in the estimate column and other information are in statistic:

cb_res$estimate[[1]]
cb_res$statistic[[1]]

Note that we can get the data for this rule by using [rlang::parse_expr()] with it:

rule_1_expr <- parse_expr(cb_res$rule[1])
rule_1_expr

then use it to get the data back:

filter(ames, !!rule_1_expr)

Now let's fit a RuleFit model. First, we'll use a recipe to convert the Central Air predictor to an indicator:

xrf_reg_mod <-
  rule_fit(trees = 3, penalty = .001) %>%
  set_engine("xrf") %>%
  set_mode("regression")
# Make dummy variables since xgboost will not

ames_rec <-
  recipe(Sale_Price ~ ., data = ames) %>%
  step_dummy(Central_Air) %>%
  step_zv(all_predictors())

ames_processed <- prep(ames_rec) %>% bake(new_data = NULL)

xrf_reg_fit <-
  xrf_reg_mod %>%
  fit(Sale_Price ~ ., data = ames_processed)

xrf_rule_res <- tidy(xrf_reg_fit, penalty = .001)

xrf_rule_res

Here, the focus is on the model coefficients produced by glmnet. We can also break down the results and sort them by the original predictor columns:

tidy(xrf_reg_fit, penalty = .001, unit = "columns")

C5.0 classification models

Here, we'll use the Palmer penguin data:

data(penguins, package = "modeldata")

penguins <- drop_na(penguins)

First, let's fit a boosted rule-based model and tidy:

rule_model <- 
  C5_rules(trees = 3) %>% 
  fit(island ~ ., data = penguins)

rule_info <- tidy(rule_model)

rule_info

# The statistic column has the pre-computed data about the 
# data covered by the rule:
rule_info$statistic[[1]]

Tree-based models can also be tidied. Rather than saving the results in a recursive tree structure, we can show the paths to each of the terminal nodes (which is just a rule).

Let's fit a model and tidy:

tree_model <- 
  boost_tree(trees = 3) %>% 
  set_engine("C5.0") %>% 
  set_mode("classification") %>% 
  fit(island ~ ., data = penguins)

tree_info <- tidy(tree_model)

tree_info

# The statistic column has the class breakdown:
tree_info$statistic[[1]]

Note that C5.0 models can have fractional estimates of counts in the terminal nodes.



Try the rules package in your browser

Any scripts or data that you put into this service are public.

rules documentation built on March 31, 2023, 8:24 p.m.