View source: R/machine_learn.R
machine_learn | R Documentation |
Prepare data and train machine learning models.
machine_learn( d, ..., outcome, models, metric, tune = TRUE, positive_class, n_folds = 5, tune_depth = 10, impute = TRUE, model_name = NULL, allow_parallel = FALSE )
d |
A data frame |
... |
Columns to be ignored in model training, e.g. ID columns, unquoted. |
outcome |
Name of the target column, i.e. what you want to predict.
Unquoted. Must be named, i.e. you must specify |
models |
Names of models to try. See |
metric |
Which metric should be used to assess model performance? Options for classification: "ROC" (default) (area under the receiver operating characteristic curve) or "PR" (area under the precision-recall curve). Options for regression: "RMSE" (default) (root-mean-squared error, default), "MAE" (mean-absolute error), or "Rsquared." Options for multiclass: "Accuracy" (default) or "Kappa" (accuracy, adjusted for class imbalance). |
tune |
If TRUE (default) models will be tuned via
|
positive_class |
For classification only, which outcome level is the "yes" case, i.e. should be associated with high probabilities? Defaults to "Y" or "yes" if present, otherwise is the first level of the outcome variable (first alphabetically if the training data outcome was not already a factor). |
n_folds |
How many folds to use to assess out-of-fold accuracy? Default = 5. Models are evaluated on out-of-fold predictions whether tune is TRUE or FALSE. |
tune_depth |
How many hyperparameter combinations to try? Default = 10. Value is multiplied by 5 for regularized regression. Ignored if tune is FALSE. |
impute |
Logical, if TRUE (default) missing values will be filled by
|
model_name |
Quoted, name of the model. Defaults to the name of the outcome variable. |
allow_parallel |
Depreciated. Instead, control the number of cores though your
parallel back end (e.g. with |
This is a high-level wrapper function. For finer control of data
cleaning and preparation use prep_data
or the functions it
wraps. For finer control of model tuning use tune_models
.
A model_list object. You can call plot
, summary
,
evaluate
, or predict
on a model_list.
# These examples take about 30 seconds to execute so aren't run automatically, # but you should be able to execute this code locally. # Split the data into training and test sets d <- split_train_test(d = pima_diabetes, outcome = diabetes, percent_train = .9) ### Classification ### # Clean and prep the training data, specifying that patient_id is an ID column, # and tune algorithms over hyperparameter values to predict diabetes diabetes_models <- machine_learn(d$train, patient_id, outcome = diabetes) # Inspect model specification and performance diabetes_models # Make predictions (predicted probability of diabetes) on test data predict(diabetes_models, d$test) ### Regression ### # If the outcome variable is numeric, regression models will be trained age_model <- machine_learn(d$train, patient_id, outcome = age) # Get detailed information about performance over tuning values summary(age_model) # Get available performance metrics evaluate(age_model) # Plot training performance on tuning metric (default = RMSE) plot(age_model) # If new data isn't specifed, get predictions on training data predict(age_model) ### Faster model training without tuning hyperparameters ### # Train models at set hyperparameter values by setting tune to FALSE. This is # faster (especially on larger datasets), but produces models with less # predictive power. machine_learn(d$train, patient_id, outcome = diabetes, tune = FALSE) ### Train models optimizing given metric ### machine_learn(d$train, patient_id, outcome = diabetes, metric = "PR")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.