flash_models: Train models without tuning for performance

Description Usage Arguments Details Value See Also Examples

View source: R/flash_models.R

Description

Train models without tuning for performance

Usage

1
2
flash_models(d, outcome, models, metric, positive_class, n_folds = 5,
  model_class, model_name = NULL, allow_parallel = FALSE)

Arguments

d

A data frame from prep_data. If you want to prepare your data on your own, use prep_data(..., no_prep = TRUE).

outcome

Optional. Name of the column to predict. When omitted the outcome from prep_data is used; otherwise it must match the outcome provided to prep_data.

models

Names of models to try. See get_supported_models for available models. Default is all available models.

metric

Which metric should be used to assess model performance? Options for classification: "ROC" (default) (area under the receiver operating characteristic curve) or "PR" (area under the precision-recall curve). Options for regression: "RMSE" (default) (root-mean-squared error, default), "MAE" (mean-absolute error), or "Rsquared." Options for multiclass: "Accuracy" (default) or "Kappa" (accuracy, adjusted for class imbalance).

positive_class

For classification only, which outcome level is the "yes" case, i.e. should be associated with high probabilities? Defaults to "Y" or "yes" if present, otherwise is the first level of the outcome variable (first alphabetically if the training data outcome was not already a factor).

n_folds

How many folds to train the model on. Default = 5, minimum = 2. Whie flash_models doesn't use cross validation to tune hyperparameters, it trains n_folds models to evaluate performance out of fold.

model_class

"regression" or "classification". If not provided, this will be determined by the class of 'outcome' with the determination displayed in a message.

model_name

Quoted, name of the model. Defaults to the name of the outcome variable.

allow_parallel

Logical, defaults to FALSE. If TRUE and a parallel backend is set up (e.g. with doMC) models with support for parallel training will be trained across cores.

Details

This function has two major differences from tune_models: 1. It uses fixed default hyperparameter values to train models instead of using cross-validation to optimize hyperparameter values for predictive performance, and, as a result, 2. It is much faster.

If you want to train a model at a single set of non-default hyperparameter values use tune_models and pass a single-row data frame to the hyperparameters arguemet.

Value

A model_list object. You can call plot, summary, evaluate, or predict on a model_list.

See Also

For setting up model training: prep_data, supported_models, hyperparameters

For evaluating models: plot.model_list, evaluate.model_list

For making predictions: predict.model_list

For optimizing performance: tune_models

To prepare data and tune models in a single step: machine_learn

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
# Prepare data
prepped_data <- prep_data(pima_diabetes, patient_id, outcome = diabetes)

# Get models quickly at default hyperparameter values
flash_models(prepped_data)

# Speed comparison of no tuning with flash_models vs. tuning with tune_models:
# ~15 seconds:
system.time(
  tune_models(prepped_data, diabetes)
)
# ~3 seconds:
system.time(
  flash_models(prepped_data, diabetes)
)

## End(Not run)

healthcareai documentation built on Sept. 2, 2018, 1:03 a.m.