metamodel_fit: Models stacking using SuperLearner algorithm.

Description Usage Arguments Value Examples

View source: R/metamodel_fit.R

Description

Fit arbitrary metamodel on first-level models predictions.

Usage

1
2
3
metamodel_fit(data, target, splits, models, model_params, model_args,
  preproc_funs, metamodel = lm, metamodel_params = list(NULL),
  metamodel_interface = "formula")

Arguments

data

data.table with all input data.

target

Target variable name (character).

splits

data.table with train/validation splits. Each column is an indicator variable with 1 corresponds to observations in validation dataset.

models

Named list of fit functions from tuneR package (xgb_fit, lgb_fit etc.)

model_params

List of data.table's with tunable model parameters.

model_args

List of unchangeable model parameters.

preproc_funs

List of preprocessing functions (one function per model) which takes data.table data+split as input and returns processed data.table with same target and split columns.

metamodel

Function for fitting metamodel.

metamodel_params

List with metamodel parameters.

metamodel_interface

"formula" or "matrix" depending on the metamodel type.

Value

Object with fitted metamodel.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# Input data
dt <- as.data.table(mtcars)

# data.table with resamples
splits <- resampleR::cv_base(dt, "hp")

# List of models
models <- list("xgboost" = xgb_fit, "catboost" = catboost_fit)

# Model parameters
xgb_params <- data.table(
    max_depth = 6,
    eta = 0.025,
    colsample_bytree = 0.9,
    subsample = 0.8,
    gamma = 0,
    min_child_weight = 5,
    alpha = 0,
    lambda = 1
)
xgb_args <- list(
    nrounds = 500,
    early_stopping_rounds = 10,
    booster = "gbtree",
    eval_metric = "rmse",
    objective = "reg:linear",
    verbose = 0
)

catboost_params <- data.table(
    iterations = 1000,
    learning_rate = 0.05,
    depth = 8,
    loss_function = "RMSE",
    eval_metric = "RMSE",
    random_seed = 42,
    od_type = 'Iter',
    od_wait = 10,
    use_best_model = TRUE,
    logging_level = "Silent"
)
catboost_args <- NULL

model_params <- list(xgb_params, catboost_params)
model_args <- list(xgb_args, catboost_args)

# Dumb preprocessing function
# Real function will contain imputation, feature engineering etc.
# with all statistics computed on train folds and applied to validation fold
preproc_fun_example <- function(data) return(data[])
# List of preprocessing fuctions for each model
preproc_funs <- list(preproc_fun_example, preproc_fun_example)

metamodel_obj <- metamodel_fit(data = dt,
                               target = "hp",
                               split = splits,
                               models = models,
                               model_params = model_params,
                               model_args = model_args,
                               preproc_funs = preproc_funs,
                               metamodel = ranger::ranger,
                               metamodel_params = list(num.trees = 3),
                               metamodel_interface = "formula"
                               )
first_level_preds <- across_models(data = dt,
                                   target = "hp",
                                   split = splits[, split_1],
                                   models = models,
                                   model_params = model_params,
                                   model_args = model_args,
                                   preproc_funs = preproc_funs)
predict(metamodel_obj, first_level_preds)$predictions

statist-bhfz/stackeR documentation built on Aug. 7, 2019, 4:57 a.m.