across_grid: Train and evaluate model across combinations of tunable...
In statist-bhfz/grideR: Easy model tuning with data.table

Description Usage Arguments Value Examples

View source: R/across_grid.R

Fit and evaluate xgboost model with data.table as input data. Model are trained (including all preprocessing steps) on train part and evaluated on validation part according to split indicator variable.

1 2	across_grid(data, target, split, fit_fun, preproc_fun, grid, args, metrics, return_val_preds = FALSE, ...)

`data`	data.table with all input data.
`split`	Indicator variable with 1 corresponds to observations in validation dataset.
`preproc_fun`	Preprocessing function which takes data.table `data`+`split` as input and returns processed data.table with same `target` and `split` columns.
`grid`	data.table with combinations of tunable hyperparameters in rows.
`args`	List with parameters unchangeable during tuning.
`metrics`	Vector of metric functions names.
`return_val_preds`	If `TRUE`, predictions for validation data will be returned.
`...`	Other parameters for inner fit function.
`y`	Target variable name (character).

data.table composed with grid, optimal numbers of iterions (implies that we use early stopping) and all metrics calculated for validation part of the data. It also contains predictions for validation data if return_val_preds = TRUE.

# Input data
dt <- as.data.table(mtcars)
# data.table with resamples
splits <- resampleR::cv_base(dt, "hp")
# data.table with tunable model hyperparameters
xgb_grid <- CJ(
    max_depth = c(6, 8),
    eta = 0.025,
    colsample_bytree = 0.9,
    subsample = 0.8,
    gamma = 0,
    min_child_weight = c(3, 5),
    alpha = 0,
    lambda = 1
)
# Non-tunable parameters for xgboost
xgb_args <- list(
    nrounds = 500,
    early_stopping_rounds = 10,
    booster = "gbtree",
    eval_metric = "rmse",
    objective = "reg:linear",
    verbose = 0
)
# Dumb preprocessing function
# Real function will contain imputation, feature engineering etc.
# with all statistics computed on train folds and applied to validation fold
preproc_fun_example <- function(data) return(data[])
across_grid(data = dt,
            target = "hp",
            split = splits[, split_1],
            fit_fun = xgb_fit,
            preproc_fun = preproc_fun_example,
            grid = xgb_grid,
            args = xgb_args,
            metrics = c("rmse", "mae"),
            return_val_preds = FALSE)