train_lm: Train a Lasso linear model.

Description Usage Arguments Value See Also Examples

View source: R/lm.R

Description

Train a Lasso linear model. The training routine automatically selects the best lambda parameter using glmnet::cv.glmnet().

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
train_lm(
  training_data,
  outcome,
  metric = c("rmse", "mae"),
  na_action = c("medianimpute", "knnimpute"),
  lambda = NULL,
  cv_nfolds = 10,
  id_col = NULL,
  strata = NULL,
  selection_method = "Breiman",
  include_nullmod = TRUE,
  err_if_nullmod = FALSE,
  warn_if_nullmod = TRUE,
  n_cores = 1
)

Arguments

training_data

A data frame. The data used to train the model.

outcome

A string. The name of the outcome variable. This must be a column in training_data.

metric

A string. "rmse" or "mae".

na_action

A string. How to impute missing data in explanatory variables"medianimpute" or "knnimpute". See recipes::step_medianimpute() and recipes::step_knnimpute(). Default is "medianimpute".

lambda

A numeric vector. Optional. A grid of lambdas for tuning the Lasso. If you leave this as NULL, recommended, a sensible grid is chosen for you.

cv_nfolds

A positive integer. The number of folds for cross-validation.

id_col

A string. If there is a sample identifier column, specify it here to tell the model not to use it as a predictor.

strata

A string. Variable to stratify on when splitting for cross-validation.

selection_method

A string. How to select the best model. There are two options: "Breiman" and "absolute". "absolute" selects the best model by selecting the model with the best mean performance according to the chosen metric. "Breiman" selects the simplest model that comes within one standard deviation of the best score. The idea being that simple models generalize better, so it's better to select a simple model that had near-best performance.

include_nullmod

A bool. Include the null model (predicts mean or most common class every time) in the model comparison? This is recommended. If the null model comes within a standard deviation of the otherwise best model, the null model is chosen instead.

err_if_nullmod

A bool. If the null model is chosen, throw an error rather than returning the null model.

warn_if_nullmod

A bool. Warn if returning the null model?

n_cores

A positive integer. The cross-validation can optionally be done in parallel. Specify the number of cores for parallel processing here.

Value

A parsnip::model_fit object. To use this fitted model mod to make predictions on some new data df_new, use predict(mod, new_data = df_new).

See Also

Other model trainers: train_gbm(), train_glm()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
iris_data <- janitor::clean_names(datasets::iris)
iris_data_split <- rsample::initial_split(iris_data, strata = species)
mod <- train_lm(
  training_data = rsample::training(iris_data_split),
  outcome = "petal_length",
  metric = "mae",
  n_cores = 5
)
preds <- predict(mod, new_data = rsample::testing(iris_data_split))
dplyr::bind_cols(preds,
  truth = rsample::testing(iris_data_split)$petal_length
)
yardstick::mae_vec(
  truth = rsample::testing(iris_data_split)$petal_length,
  estimate = preds[[1]]
)

mirvie/mirmodels documentation built on Jan. 14, 2022, 11:12 a.m.