knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>", message = FALSE)

This is an R package to tune hyperparameters for machine learning algorithms using Bayesian Optimization based on Gaussian Processes. Algorithms currently supported are: Support vector machines, Random forest, and XGboost.

This package has some features:

- It's very easy to write Bayesian Optimaization function, but you also able to customise your model very easily.
- Any class (character, integer, factor) of label column is OK.

In many methods of machinelearning, it is very important to tune hyperparameters. "Grid Search" was often used to search the appropriate hyperaprameters, but it takes too much time to compute.

To solve this problem, Bayesian Optimization is often used to tune hyperparameters fast. This is a sequential design strategy for global optimization of black-box functions.

While Grid Search is simply an exhaustive searching through a manually specified subset of the hyperparameter space, Bayesian Optimization constructs a posterior distribution of functions (gaussian process) that describes the function you want to optimize best, and search the point whose score may be better.

We could execute bayesian optimization using `rBayesianOptimization`

package in the past.

- Make data
- Make the function to maximize
- Execute the Bayesian Optimization

For example, if you want to tune hyperparameters of XGboost with 5-fold cross validation, you have to write as following:

library(xgboost) library(Matrix) data(agaricus.train, package = "xgboost") dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label) cv_folds <- KFold(agaricus.train$label, nfolds = 5, stratified = TRUE, seed = 0) xgb_cv_bayes <- function(max_depth, min_child_weight, subsample) { cv <- xgb.cv(params = list(booster = "gbtree", eta = 0.01, max_depth = max_depth, min_child_weight = min_child_weight, subsample = subsample, colsample_bytree = 0.3, lambda = 1, alpha = 0, objective = "binary:logistic", eval_metric = "auc"), data = dtrain, nround = 100, folds = cv_folds, prediction = TRUE, showsd = TRUE, early_stopping_rounds = 5, maximize = TRUE, verbose = 0) list(Score = cv$evaluation_log$test_auc_mean[cv$best_iteration], Pred = cv$pred) } OPT_Res <- BayesianOptimization(xgb_cv_bayes, bounds = list(max.depth = c(2L, 6L), min_child_weight = c(1L, 10L), subsample = c(0.5, 0.8)), init_grid_dt = NULL, init_points = 10, n_iter = 20, acq = "ucb", kappa = 2.576, eps = 0.0, verbose = TRUE)

On the other hand, we can write this very easily with `MlBayesOpt`

package.

library(MlBayesOpt) res0 <- xgb_cv_opt(data = agaricus.train$data, label = agaricus.train$label, objectfun = "binary:logistic", evalmetric = "auc", n_folds = 5, acq = "ucb", init_points = 10, n_iter = 20)

When the data has `data.frame`

class, you have only to write column name to specify the label.

# This takes a lot of time # fashion data is included in this package res0 <- xgb_cv_opt(data = fashion, label = y, objectfun = "multi:softmax", evalmetric = "merror", n_folds = 15, classes = 10)

You can install **MlBayesOpt** from CRAN:

install.packages("MlBayesOpt")

You can install MlBayesOpt (latest dev version) from github with:

# install.packages("githubinstall") githubinstall::githubinstall("MlBayesOpt") # install.packages("devtools") devtools::install_github("ymattu/MlBayesOpt")

To use this package, please load it by `library()`

function.

```
library(MlBayesOpt)
```

The source code for **MlBayesOpt** package is available on GitHub at

- https://github.com/ymattu/MlBayesOpt

First of all, `*_opt()`

functions mean "Hold-Out", and `*_cv_opt()`

ones mean "Cross Validation". In "Hold Out" functions, you at least need to specify both the nameof data and the name of label column in training and test data. In "Cross Validation" functions, you have only to write column name to specify the label and how many times you validate in `n_folds`

option.

Second, you can specify the options of Bayesian Optimization parameters, such as `init_points`

or `n_iter`

. For details of these options, see the help of `rBayesianOptimization::BayesianOptimization()`

function.

Except for `xgb_cv_opt()`

function, the returned value means test accuracy. In `xgb_cv_opt()`

function, he returned value means the value you specified in `evalmetric`

option.

About SVM, this package supports hold-out tuning (`svm_opt()`

) and cross-validation tuning(`svm_cv_opt()`

).

In SVM functions, you can specify the kind of kernel to compute (default is "radial") from following options.

**linear:**$u'v$**polynomial:**$(\gamma u'v +coef0)^{degree}$**radial basis:**$\exp(-\gamma|u-v|^2)$**sigmoid:**$tanh(\gamma u'v + coef0)$

Also, when you want to adjust the range of parameters, you can.

For details of these SVM options, see the help of `e1071::svm()`

function.

res0 <- svm_cv_opt(data = fashion_train, label = y, svm_kernel = "polynomial", degree_range = c(2L, 4L), n_folds = 3, kappa = 5, init_points = 4, n_iter = 5)

This package supports only hold-out tuning so far about Random Forest(`rf_opt()`

).

In `rf_opt()`

, you can specify the range of the number of the trees (`num_trees`

), the value of mtry used (`mtry_range`

), and the range of minimum node sizes to best tested (`min_node_size_range`

).

For details of these Random Forest options, see the help of `ranger::ranger()`

function.

res0 <- rf_opt(train_data = fashion_train, train_label = y, test_data = fashion_test, test_label = y, mtry_range = c(1L, ncol(fashion_train)-1), num_tree = 10L, init_points = 4, n_iter = 5)

For XGboost, this package supports hold-out tuning (`xgb_opt()`

) and cross-validation tuning(`xgb_cv_opt()`

).

In XGboost functions, you must specify object function like `objectfun = "multi:softmax"`

and how to evaluate (e.g. `evalmetric = "merror"`

). Also, you can specify the range of xgboost options like eta (`eta_range`

) and subsample (`subsample_range`

).

In only `xgb_cv_opt()`

function, the returned value means the value you specified in `evalmetric`

option, so when you specify the evaluation metric "merror" or "(m)logloss", the value is minus.

For details of these Random Forest options, see the help of `xgboost::xgb.train()`

or `xgboost::xgb.cv()`

function.

res0 <- xgb_cv_opt(data = fashion_train, label = y, objectfun = "multi:softmax", evalmetric = "merror", n_folds = 3, classes = 10, init_points = 4, n_iter = 5)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.