Description Usage Arguments Value See Also Examples
Train a Lasso linear model. The training routine automatically selects the
best lambda parameter using glmnet::cv.glmnet()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
training_data |
A data frame. The data used to train the model. |
outcome |
A string. The name of the outcome variable. This must be a
column in |
metric |
A string. |
na_action |
A string. How to impute missing data in explanatory
variables |
lambda |
A numeric vector. Optional. A grid of lambdas for tuning the
Lasso. If you leave this as |
cv_nfolds |
A positive integer. The number of folds for cross-validation. |
id_col |
A string. If there is a sample identifier column, specify it here to tell the model not to use it as a predictor. |
strata |
A string. Variable to stratify on when splitting for cross-validation. |
selection_method |
A string. How to select the best model. There are two options: "Breiman" and "absolute". "absolute" selects the best model by selecting the model with the best mean performance according to the chosen metric. "Breiman" selects the simplest model that comes within one standard deviation of the best score. The idea being that simple models generalize better, so it's better to select a simple model that had near-best performance. |
include_nullmod |
A bool. Include the null model (predicts mean or most common class every time) in the model comparison? This is recommended. If the null model comes within a standard deviation of the otherwise best model, the null model is chosen instead. |
err_if_nullmod |
A bool. If the null model is chosen, throw an error rather than returning the null model. |
warn_if_nullmod |
A bool. Warn if returning the null model? |
n_cores |
A positive integer. The cross-validation can optionally be done in parallel. Specify the number of cores for parallel processing here. |
A parsnip::model_fit object. To use this fitted model mod
to make
predictions on some new data df_new
, use
predict(mod, new_data = df_new)
.
Other model trainers:
train_gbm()
,
train_glm()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | iris_data <- janitor::clean_names(datasets::iris)
iris_data_split <- rsample::initial_split(iris_data, strata = species)
mod <- train_lm(
training_data = rsample::training(iris_data_split),
outcome = "petal_length",
metric = "mae",
n_cores = 5
)
preds <- predict(mod, new_data = rsample::testing(iris_data_split))
dplyr::bind_cols(preds,
truth = rsample::testing(iris_data_split)$petal_length
)
yardstick::mae_vec(
truth = rsample::testing(iris_data_split)$petal_length,
estimate = preds[[1]]
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.