regression.train_model: Train a tidymodels model via workflows

View source: R/approach_regression_separate.R

regression.train_modelR Documentation

Train a tidymodels model via workflows

Description

Function that trains a tidymodels model via workflows based on the provided input parameters. This function allows for cross validating the hyperparameters of the model.

Usage

regression.train_model(
  x,
  seed = 1,
  verbose = 0,
  regression.model = parsnip::linear_reg(),
  regression.tune = FALSE,
  regression.tune_values = NULL,
  regression.vfold_cv_para = NULL,
  regression.recipe_func = NULL,
  regression.response_var = "y_hat",
  regression.surrogate_n_comb = NULL
)

Arguments

x

Data.table containing the data. Either the training data or the explicands. If x is the explicands, then index_features must be provided.

seed

Positive integer. Specifies the seed before any randomness based code is being run. If NULL the seed will be inherited from the calling environment.

verbose

An integer specifying the level of verbosity. If 0, shapr will stay silent. If 1, it will print information about performance. If 2, some additional information will be printed out. Use 0 (default) for no verbosity, 1 for low verbose, and 2 for high verbose. TODO: Make this clearer when we end up fixing this and if they should force a progressr bar.

regression.model

A tidymodels object of class model_specs. Default is a linear regression model, i.e., parsnip::linear_reg(). See tidymodels for all possible models, and see the vignette for how to add new/own models. Note, to make it easier to call explain() from Python, the regression.model parameter can also be a string specifying the model which will be parsed and evaluated. For example, ⁠"parsnip::rand_forest(mtry = hardhat::tune(), trees = 100, engine = "ranger", mode = "regression")"⁠ is also a valid input. It is essential to include the package prefix if the package is not loaded.

regression.tune

Logical (default is FALSE). If TRUE, then we are to tune the hyperparemeters based on the values provided in regression.tune_values. Note that no checks are conducted as this is checked earlier in setup_approach.regression_separate and setup_approach.regression_surrogate.

regression.tune_values

Either NULL (default), a data.frame/data.table/tibble, or a function. The data.frame must contain the possible hyperparameter value combinations to try. The column names must match the names of the tuneable parameters specified in regression.model. If regression.tune_values is a function, then it should take one argument x which is the training data for the current combination/coalition and returns a data.frame/data.table/tibble with the properties described above. Using a function allows the hyperparameter values to change based on the size of the combination. See the regression vignette for several examples. Note, to make it easier to call explain() from Python, the regression.tune_values can also be a string containing an R function. For example, "function(x) return(dials::grid_regular(dials::mtry(c(1, ncol(x)))), levels = 3))" is also a valid input. It is essential to include the package prefix if the package is not loaded.

regression.vfold_cv_para

Either NULL (default) or a named list containing the parameters to be sent to rsample::vfold_cv(). See the regression vignette for several examples.

regression.recipe_func

Either NULL (default) or a function that that takes in a recipes::recipe() object and returns a modified recipes::recipe() with potentially additional recipe steps. See the regression vignette for several examples. Note, to make it easier to call explain() from Python, the regression.recipe_func can also be a string containing an R function. For example, "function(recipe) return(recipes::step_ns(recipe, recipes::all_numeric_predictors(), deg_free = 2))" is also a valid input. It is essential to include the package prefix if the package is not loaded.

regression.response_var

String (default is y_hat) containing the name of the response variable.

regression.surrogate_n_comb

Integer (default is NULL). The number of times each training observations has been augmented. If NULL, then we assume that we are doing separate regression.

Value

A trained tidymodels model based on the provided input parameters.

Author(s)

Lars Henry Berge Olsen


NorskRegnesentral/shapr documentation built on April 19, 2024, 1:19 p.m.