across_grid: Train and evaluate model across combinations of tunable...

Description Usage Arguments Value Examples

View source: R/across_grid.R

Description

Fit and evaluate xgboost model with data.table as input data. Model are trained (including all preprocessing steps) on train part and evaluated on validation part according to split indicator variable.

Usage

1
2
across_grid(data, target, split, fit_fun, preproc_fun, grid, args, metrics,
  return_val_preds = FALSE, ...)

Arguments

data

data.table with all input data.

split

Indicator variable with 1 corresponds to observations in validation dataset.

preproc_fun

Preprocessing function which takes data.table data+split as input and returns processed data.table with same target and split columns.

grid

data.table with combinations of tunable hyperparameters in rows.

args

List with parameters unchangeable during tuning.

metrics

Vector of metric functions names.

return_val_preds

If TRUE, predictions for validation data will be returned.

...

Other parameters for inner fit function.

y

Target variable name (character).

Value

data.table composed with grid, optimal numbers of iterions (implies that we use early stopping) and all metrics calculated for validation part of the data. It also contains predictions for validation data if return_val_preds = TRUE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Input data
dt <- as.data.table(mtcars)
# data.table with resamples
splits <- resampleR::cv_base(dt, "hp")
# data.table with tunable model hyperparameters
xgb_grid <- CJ(
    max_depth = c(6, 8),
    eta = 0.025,
    colsample_bytree = 0.9,
    subsample = 0.8,
    gamma = 0,
    min_child_weight = c(3, 5),
    alpha = 0,
    lambda = 1
)
# Non-tunable parameters for xgboost
xgb_args <- list(
    nrounds = 500,
    early_stopping_rounds = 10,
    booster = "gbtree",
    eval_metric = "rmse",
    objective = "reg:linear",
    verbose = 0
)
# Dumb preprocessing function
# Real function will contain imputation, feature engineering etc.
# with all statistics computed on train folds and applied to validation fold
preproc_fun_example <- function(data) return(data[])
across_grid(data = dt,
            target = "hp",
            split = splits[, split_1],
            fit_fun = xgb_fit,
            preproc_fun = preproc_fun_example,
            grid = xgb_grid,
            args = xgb_args,
            metrics = c("rmse", "mae"),
            return_val_preds = FALSE)

statist-bhfz/grideR documentation built on Aug. 8, 2019, 7:08 p.m.