tof_fit_split: Fit a glmnet model and calculate performance metrics using a...

View source: R/modeling_helpers.R

tof_fit_splitR Documentation

Fit a glmnet model and calculate performance metrics using a single rsplit object

Description

This function trains a glmnet model on the training set of an rsplit object, then calculates performance metrics of that model on the validation/holdout set at all combinations of the mixture and penalty hyperparameters provided in a hyperparameter grid.

Usage

tof_fit_split(
  split_data,
  prepped_recipe,
  hyperparameter_grid,
  model_type,
  outcome_colnames
)

Arguments

split_data

An 'rsplit' object from the rsample package. Alternatively, an unsplit tbl_df can be provided, though this is not recommended.

prepped_recipe

A trained recipe

hyperparameter_grid

A tibble containing the hyperparameter values to tune. Can be created using tof_create_grid

model_type

A string representing the type of glmnet model being fit.

outcome_colnames

Quoted column names indicating which columns in the data being fit represent the outcome variables (with all others assumed to be predictors).

Value

A tibble with the same number of rows as the input hyperparameter grid. Each row represents a combination of mixture and penalty, and each column contains a performance metric for the fitted glmnet model on ‘split_data'’s holdout set. The specific performance metrics depend on the type of model being fit:

"linear"

mean-squared error ('mse') and mean absolute error ('mae')

"two-class"

binomial deviance ('binomial_deviance'); misclassification error rate 'misclassification_error'; the area under the receiver-operating curve ('roc_auc'); and 'mse' and 'mse' as above

"multiclass"

multinomial deviance ('multinomial_deviance'); misclassification error rate 'misclassification_error'; the area under the receiver-operating curve ('roc_auc') computed using the Hand-Till method in roc_auc; and 'mse' and 'mse' as above

"survival"

the negative log2-transformed partial likelihood ('neg_log_partial_likelihood') and Harrel's concordance index (often simply called "C"; 'concordance_index')

References

Harrel Jr, F. E. and Lee, K. L. and Mark, D. B. (1996) Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing error, Statistics in Medicine, 15, pages 361–387.


keyes-timothy/tidytof documentation built on Aug. 28, 2024, 8:37 a.m.