machine_learn: Machine learning made easy
In healthcareai: Tools for Healthcare Machine Learning

machine_learn

R Documentation

Machine learning made easy

Description

Prepare data and train machine learning models.

Usage

machine_learn(
  d,
  ...,
  outcome,
  models,
  metric,
  tune = TRUE,
  positive_class,
  n_folds = 5,
  tune_depth = 10,
  impute = TRUE,
  model_name = NULL,
  allow_parallel = FALSE
)

Arguments

`d`	A data frame
`...`	Columns to be ignored in model training, e.g. ID columns, unquoted.
`outcome`	Name of the target column, i.e. what you want to predict. Unquoted. Must be named, i.e. you must specify `outcome =`
`models`	Names of models to try. See `get_supported_models` for available models. Default is all available models.
`metric`	Which metric should be used to assess model performance? Options for classification: "ROC" (default) (area under the receiver operating characteristic curve) or "PR" (area under the precision-recall curve). Options for regression: "RMSE" (default) (root-mean-squared error, default), "MAE" (mean-absolute error), or "Rsquared." Options for multiclass: "Accuracy" (default) or "Kappa" (accuracy, adjusted for class imbalance).
`tune`	If TRUE (default) models will be tuned via `tune_models`. If FALSE, models will be trained via `flash_models` which is substantially faster but produces less-predictively powerful models.
`positive_class`	For classification only, which outcome level is the "yes" case, i.e. should be associated with high probabilities? Defaults to "Y" or "yes" if present, otherwise is the first level of the outcome variable (first alphabetically if the training data outcome was not already a factor).
`n_folds`	How many folds to use to assess out-of-fold accuracy? Default = 5. Models are evaluated on out-of-fold predictions whether tune is TRUE or FALSE.
`tune_depth`	How many hyperparameter combinations to try? Default = 10. Value is multiplied by 5 for regularized regression. Ignored if tune is FALSE.
`impute`	Logical, if TRUE (default) missing values will be filled by `hcai_impute`
`model_name`	Quoted, name of the model. Defaults to the name of the outcome variable.
`allow_parallel`	Depreciated. Instead, control the number of cores though your parallel back end (e.g. with `doMC`).

Details

This is a high-level wrapper function. For finer control of data cleaning and preparation use prep_data or the functions it wraps. For finer control of model tuning use tune_models.

Value

A model_list object. You can call plot, summary, evaluate, or predict on a model_list.

Examples

# These examples take about 30 seconds to execute so aren't run automatically,
# but you should be able to execute this code locally.

# Split the data into training and test sets
d <- split_train_test(d = pima_diabetes,
                      outcome = diabetes,
                      percent_train = .9)

### Classification ###

# Clean and prep the training data, specifying that patient_id is an ID column,
# and tune algorithms over hyperparameter values to predict diabetes
diabetes_models <- machine_learn(d$train, patient_id, outcome = diabetes)

# Inspect model specification and performance
diabetes_models

# Make predictions (predicted probability of diabetes) on test data
predict(diabetes_models, d$test)

### Regression ###

# If the outcome variable is numeric, regression models will be trained
age_model <- machine_learn(d$train, patient_id, outcome = age)

# Get detailed information about performance over tuning values
summary(age_model)

# Get available performance metrics
evaluate(age_model)

# Plot training performance on tuning metric (default = RMSE)
plot(age_model)

# If new data isn't specifed, get predictions on training data
predict(age_model)

### Faster model training without tuning hyperparameters ###

# Train models at set hyperparameter values by setting tune to FALSE. This is
# faster (especially on larger datasets), but produces models with less
# predictive power.
machine_learn(d$train, patient_id, outcome = diabetes, tune = FALSE)

### Train models optimizing given metric ###

machine_learn(d$train, patient_id, outcome = diabetes, metric = "PR")

healthcareai documentation built on Sept. 5, 2022, 5:12 p.m.