last_fit | R Documentation |
last_fit()
emulates the process where, after determining the best model,
the final fit on the entire training set is needed and is then evaluated on
the test set.
last_fit(object, ...)
## S3 method for class 'model_spec'
last_fit(
object,
preprocessor,
split,
...,
metrics = NULL,
eval_time = NULL,
control = control_last_fit(),
add_validation_set = FALSE
)
## S3 method for class 'workflow'
last_fit(
object,
split,
...,
metrics = NULL,
eval_time = NULL,
control = control_last_fit(),
add_validation_set = FALSE
)
object |
A |
... |
Currently unused. |
preprocessor |
A traditional model formula or a recipe created using
|
split |
An |
metrics |
A |
eval_time |
A numeric vector of time points where dynamic event time metrics should be computed (e.g. the time-dependent ROC curve, etc). The values must be non-negative and should probably be no greater than the largest event time in the training set (See Details below). |
control |
A |
add_validation_set |
For 3-way splits into training, validation, and test
set via |
This function is intended to be used after fitting a variety of models and the final tuning parameters (if any) have been finalized. The next step would be to fit using the entire training set and verify performance using the test data.
A single row tibble that emulates the structure of fit_resamples()
.
However, a list column called .workflow
is also attached with the fitted
model (and recipe, if any) that used the training set. Helper functions
for formatting tuning results like collect_metrics()
and
collect_predictions()
can be used with last_fit()
output.
Some models can utilize case weights during training. tidymodels currently supports two types of case weights: importance weights (doubles) and frequency weights (integers). Frequency weights are used during model fitting and evaluation, whereas importance weights are only used during fitting.
To know if your model is capable of using case weights, create a model spec
and test it using parsnip::case_weights_allowed()
.
To use them, you will need a numeric column in your data set that has been
passed through either hardhat:: importance_weights()
or
hardhat::frequency_weights()
.
For functions such as fit_resamples()
and the tune_*()
functions, the
model must be contained inside of a workflows::workflow()
. To declare that
case weights are used, invoke workflows::add_case_weights()
with the
corresponding (unquoted) column name.
From there, the packages will appropriately handle the weights during model fitting and (if appropriate) performance estimation.
Three types of metrics can be used to assess the quality of censored regression models:
static: the prediction is independent of time.
dynamic: the prediction is a time-specific probability (e.g., survival probability) and is measured at one or more particular times.
integrated: same as the dynamic metric but returns the integral of the different metrics from each time point.
Which metrics are chosen by the user affects how many evaluation times should be specified. For example:
# Needs no `eval_time` value metric_set(concordance_survival) # Needs at least one `eval_time` metric_set(brier_survival) metric_set(brier_survival, concordance_survival) # Needs at least two eval_time` values metric_set(brier_survival_integrated, concordance_survival) metric_set(brier_survival_integrated, concordance_survival) metric_set(brier_survival_integrated, concordance_survival, brier_survival)
Values of eval_time
should be less than the largest observed event
time in the training data. For many non-parametric models, the results beyond
the largest time corresponding to an event are constant (or NA
).
last_fit()
is closely related to fit_best()
. They both
give you access to a workflow fitted on the training data but are situated
somewhat differently in the modeling workflow. fit_best()
picks up
after a tuning function like tune_grid()
to take you from tuning results
to fitted workflow, ready for you to predict and assess further. last_fit()
assumes you have made your choice of hyperparameters and finalized your
workflow to then take you from finalized workflow to fitted workflow and
further to performance assessment on the test data. While fit_best()
gives
a fitted workflow, last_fit()
gives you the performance results. If you
want the fitted workflow, you can extract it from the result of last_fit()
via extract_workflow().
library(recipes)
library(rsample)
library(parsnip)
set.seed(6735)
tr_te_split <- initial_split(mtcars)
spline_rec <- recipe(mpg ~ ., data = mtcars) %>%
step_ns(disp)
lin_mod <- linear_reg() %>%
set_engine("lm")
spline_res <- last_fit(lin_mod, spline_rec, split = tr_te_split)
spline_res
# test set metrics
collect_metrics(spline_res)
# test set predictions
collect_predictions(spline_res)
# or use a workflow
library(workflows)
spline_wfl <-
workflow() %>%
add_recipe(spline_rec) %>%
add_model(lin_mod)
last_fit(spline_wfl, split = tr_te_split)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.