autoMLmodel: Automated machine learning training of models

Description Usage Arguments Details Value See Also Examples

View source: R/autoMLModel.R

Description

Automated training, tuning and validation of machine learning models. Models are tuned, resampled and validated on an experimental dataset and trained on the full dataset and validated/tested on external datasets. Classification models tune the probability threshold automatically and returns the results. Each model contains information on performance, model object and evaluation plots.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
autoMLmodel(
  train,
  test = NULL,
  score = NULL,
  target = NULL,
  testSplit = 0.2,
  tuneIters = 10,
  tuneType = "random",
  models = "all",
  perMetric = "auc",
  varImp = 10,
  liftGroup = 50,
  maxObs = 10000,
  uid = NULL,
  pdp = FALSE,
  positive = 1,
  htmlreport = FALSE,
  seed = 1991,
  verbose = FALSE
)

Arguments

train

[data.frame | Required] training set

test

[data.frame | Optional] optional testing set to validate models on. If none is provided, one will be created internally. Default of NULL

score

[data.frame | Optional] optional score the models on best trained model based on AUC. If none is provided, scorelist will be null. Default of NULL

target

[integer | Required] if a target is provided classification or regression models will be trained, if left as NULL unsupervised models will be trained. Default of NULL

testSplit

[numeric | Optional] percentage of data to allocate to the test set. Stratified sampling is done. Default of 0.1

tuneIters

[integer | Optional] number of tuning iterations to search for optimal hyper parameters. Default of 10

tuneType

[character | Optional] tune method applied, list of options are:

  • "random" - random search hyperparameter tuning

  • "frace" - frace uses iterated f-racing algorithm for the best solution from irace package

models

[character | Optional] which models to train. Default option is all. Please find below the names for each of the methods

  • randomForest - random forests using the randomForest package

  • ranger - random forests using the ranger package

  • xgboost - gradient boosting using xgboost

  • rpart - decision tree classification using rpart

  • glmnet - regularised regression from glmnet

  • logreg - logistic regression from stats

perMetric

[character | Optional] model validation metric. Default is "auc"

  • auc - area under the curve; mlr::auc

  • accuracy - accuracy; mlr::acc

  • balancedAccuracy - balanced accuracy; mlr::bac

  • brier - brier score; mlr::brier

  • f1 - F1 measure; mlr::f1

  • meanPrecRecall - geometric mean of precision and recall; mlr::gpr

  • logloss - logarithmic loss; mlr:logloss

varImp

[integer | Optional] number of important features to plot

liftGroup

[integer | Optional] lift value to validate the test model performance

maxObs

[numeric | Optional] number of observations in the experiment training dataset on which models are trained, tuned and resampled. Default of 40,000. If the training dataset has less than 40k observations then all the observations will be used

uid

[character | Optional] unique variables to keep in test output data

pdp

[logical | Optional] partial dependence plot for important variables

positive

[character | Optional] positive class for the target variable

htmlreport

[logical | Optional] to view the model outcome in html format

seed

[integer | Optional] random number seed for reproducible results

verbose

[logical | Optional] display executions steps on console. Default is FALSE

Details

all the models trained using mlr train function, all of the functionality in mlr package can be applied to the autoMLmodel outcome

autoMLmodel provides below the information of the various machine learning classification models

Value

List output contains trained models and results

See Also

mlr train caret train makeLearner tuneParams

Examples

1
2
3
4
# Run only Logistic regression model
mymodel <- autoMLmodel( train = heart, test = NULL, target = 'target_var',
testSplit = 0.2, tuneIters = 10, tuneType = "random", models = "logreg",
varImp = 10, liftGroup = 50, maxObs = 4000, uid = NULL, seed = 1991)

DriveML documentation built on June 14, 2021, 9:09 a.m.