knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The explore package offers a simplified way to use machine learning to understand and explain patterns in the data.

We use synthetic data in this example

library(dplyr)
library(explore)

data <- create_data_buy(obs = 1000)
glimpse(data)

Explain / Model

Decision Tree

data %>% explain_tree(target = buy)
data %>% explain_tree(target = mobiledata_prd)
data %>% explain_tree(target = age)

Random Forest

data %>% explain_forest(target = buy, ntree = 100)

To get the model itself as output you can use the parameter out = "model or out = all to get all (feature importance as plot and table, trained model). To use the model for a prediction, you can use predict_target()

XGBoost

As XGBoost only accepts numeric variables, we use drop_var_not_numeric() to drop mobile_data_prd as it is not a numeric variable. An alternative would be to convert the non numeric variables into numeric.

data %>%
  drop_var_not_numeric() |> 
  explain_xgboost(target = buy)

Use parameter out = "all" to get more details about the training

train <- data %>%
  drop_var_not_numeric() |> 
  explain_xgboost(target = buy, out = "all")
train$importance
train$tune_plot
train$tune_data

To use the model for a prediction, you can use predict_target()

Logistic Regression

data %>% explain_logreg(target = buy)

Balance Target

If you have a data set with a very unbalanced target (in this case only 5% of all observations have buy == 1) it may be difficult to create a decision tree.

data <- create_data_buy(obs = 2000, target1_prob = 0.05)
data %>% describe(buy)

It may help to balance the target before growing the decision tree (or use weighs as alternative). In this example we down sample the data so buy has 10% of target == 1.

data %>%
  balance_target(target = buy, min_prop = 0.10) %>%
  explain_tree(target = buy)


rolkra/explore documentation built on April 17, 2024, 10:58 p.m.