booster2sql: Transform XGBoost model object to SQL query.

Description Usage Arguments Value Examples

Description

This function generates SQL query for in-database scoring of XGBoost models, providing a robust and efficient way of model deployment. It takes in the trained XGBoost model xgbModel, name of the input database table input_table_name, and name of a unique identifier within that table unique_id as input, writes the SQL query to a file specified by output_file_name. Note that the input database table should be generated from the raw table using the one-hot encoding query output by onehot2sql(), or to provide the one-hot encoding query as input input_onehot_query to this function, working as sub-query inside the final model scoring query.

Usage

1
2
3
booster2sql(xgbModel, print_progress = FALSE, unique_id = NULL,
  output_file_name = NULL, input_table_name = NULL,
  input_onehot_query = NULL)

Arguments

xgbModel

The trained model object of class xgb.Booster. Current supported booster is booster="gbtree", supported objective options are:

  • reg:linear: linear regression.

  • reg:logistic: logistic regression.

  • - binary:logistic: logistic regression for binary classification, output probability.

  • binary:logitraw: logistic regression for binary classification, output score before logistic transformation.

  • - binary:hinge: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.

  • - count:poisson: poisson regression for count data, output mean of poisson distribution.

  • - reg:gamma: gamma regression with log-link, output mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.

  • - reg:tweedie: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.

print_progress

Boolean indicator controls whether the SQL generating progress should be printed to console.

unique_id

A row unique identifier is crucial for in-database scoring of XGBoost model. If not given, SQL query will be generated with id name "ROW_KEY".

output_file_name

File name that the SQL syntax will write to. It must not be empty in order for this function to run.

input_table_name

Name of raw data table in the database, that the SQL query will select from. If not given, SQL query will be generated with table name "MODREADY_TABLE".

input_onehot_query

SQL query of one-hot encoding generated by onehot2sql. When input_table_name is empty while input_onehot_query is not, the final output query will include input_onehot_query as sub-query.

Value

The SQL query will write to the file specified by output_file_name.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
library(xgboost)
# load data
df = data.frame(ggplot2::diamonds)
head(df)

# data processing
out <- onehot2sql(df)
x <- out$model.matrix[,colnames(out$model.matrix)!='price']
y <- out$model.matrix[,colnames(out$model.matrix)=='price']

# model training
bst <- xgboost(data = x,
               label = y,
               max.depth = 3,
               eta = .3,
               nround = 5,
               nthread = 1,
               objective = 'reg:linear')

# generate model scoring SQL script with ROW_KEY and MODREADY_TABLE
booster2sql(bst, output_file_name='xgb.txt')

Example output

sh: 1: cannot create /dev/null: Permission denied
sh: 1: cannot create /dev/null: Permission denied
  carat       cut color clarity depth table price    x    y    z
1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48
[04:10:43] WARNING: amalgamation/../src/objective/regression_obj.cu:174: reg:linear is now deprecated in favor of reg:squarederror.
[1]	train-rmse:4050.016113 
[2]	train-rmse:2998.798340 
[3]	train-rmse:2303.448730 
[4]	train-rmse:1847.826904 
[5]	train-rmse:1558.424438 
query is written to file with row unique id named as ROW_KEY
query is written to file with input table named as MODREADY_TABLE

xgb2sql documentation built on May 2, 2019, 1:09 p.m.