ml_tune: Wrapper for auto-tune many ML algorithms supported by caret.

Description Usage Arguments Details Value Examples

Description

Auto-tune ml model with different sampling methods, different metrics, preprocessing method, number of cores and etc, and return one model.

Usage

1
2
3
4
ml_tune(data, target, sampling = NULL, metric = "Accuracy",
  search = "random", k = 10, tuneLength = 2, repeats = 1,
  method = "xgbLinear", preProcess = NULL,
  summaryFunction = twoClassSummary, nthread = 3)

Arguments

data

the data to be trained in dataframe format.

target

A character, column name of the target variable.

sampling

A character, examples are up, down, rose, smote as supported by caret. The current version also supports ADAS, ANS, BLSMOTE, DBSMOTE, RSLS, SLS. For details on these sampling methods, please see the https://CRAN.R-project.org/package=smotefamily on CRAN.

metric

A character, examples are Accuracy, Kappa, ROC,Sens, and Spec as natively supported in caret package. F measures are expected in version(0.1.1).

search

A character, random or grid. Future version(0.1.1) would support user-defined hyper-parameter search.

k

A numeric, the number of cross-validation folds.

tuneLength

A numeric, the number of hyper-parameter combinations to try, the number of models to train is tuneLength\* k \* repeats.

repeats

A numeric, the number of repeats in cross-validation.

method

A character, the name of the machine learning algorithm.

preProcess

A character vector, the names of the pre-processing methods to apply.

summaryFunction

A function name. Use twoClassSummary for binary classification and multiClassSummary for multi-class classification.

nthread

A numeric, the number of cores to use in model training. It is best to set it to the number of physical cores you have minus 1.

Details

When using grid search, there will be N_hyper_params^tuneLength of the models being trained. When using random grid search, there will be tuneLength of models being trained, plus the eta is not set. Use Random whenever possible unless you what to fine-tune one machine learning algorithm.

Value

a list contains the model informaiton. The same structure as train function in caret package would return.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# multi-class classification
iris_classification=ml_tune(data=iris,target = "Species",metric = "Kappa",search = "random"
                           ,k=5,tuneLength = 2,repeats = 1,method = "rf"
                           ,preProcess = c("center","scale"),summaryFunction = multiClassSummary,nthread=3)

predict(iris_classification,iris)

## Not run: 
# binary classification
ml_tune(data=training,target="target",sampling="down",metric="Accuracy",search = "random"
       ,k=10,tuneLength=2,repeats=1,method="xgbLinear"
       ,preProcess=NULL,summaryFunction=twoClassSummary,nthread=3)

## End(Not run)

edwardcooper/automl documentation built on June 3, 2019, 1:05 a.m.