knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  # fig.path = "Readme_files/"
)
library(compboost)

Storing the complete [Compboost] object requires to save a lot of data:

Hence, compboost allows to store the model without the data. Within this vignettes, this is also called production mode since it is the more practical case when running the model in production.

Store model without data

To do so, just call:

dat = mlr3::tsk("sonar")$data()
cboost = boostSplines(dat, "Class", oob_fraction = 0.3)

file = "cboost.json"
cboost$saveToJson(file, rm_data = TRUE)

cboost_without_data = Compboost$new(file = file)

# The data field now just contains a dummy:
cboost_without_data$data

Note: It is not possible to use any functionality that requires the training data when storing and loading the object without data. For example, cboost$predict() now throws an error:

cboost_without_data$predict()

Functionality of a data free model

The most important functions are still usable:

Extracting feature importance.

vip = cboost_without_data$calculateFeatureImportance()

Predict on new data

ndat = dat[1:10, ]
cboost_without_data$predict(ndat)

Visualize partial feature effects.

library(patchwork)

# Use most important base learner:
bln = vip$baselearner[1]
plotBaselearner(cboost_without_data, bln) +
plotPEUni(cboost_without_data, strsplit(bln, "_")[[1]][1])

Get logger data

head(cboost_without_data$getLoggerData())

Setting the model to a previous iteration.

table(cboost_without_data$getSelectedBaselearner())
cboost_without_data$predict(ndat)

# State after 50 iteration:
cboost_without_data$train(50)
table(cboost_without_data$getSelectedBaselearner())
cboost_without_data$predict(ndat)

Advantages

Size

The size of the model and JSON file is much smaller when the data is not stored.

file_full = "cboost_full.json"
cboost$saveToJson(file_full)

file.info(file)$size / 1024^2
file.info(file_full)$size / 1024^2

Loading

Loading a model is much faster (maybe not that striking for smaller models):

system.time(Compboost$new(file = file))
system.time(Compboost$new(file = file_full))

Privacy

Raw data is not shared unintentionally with third parties. This is especially striking for domains that works with sensitive data.

file.remove(file, file_full)


schalkdaniel/compboost documentation built on April 15, 2023, 9:03 p.m.