In nguyenngocbinh/mlr3_book_vi: R package to build the mlr3 book via bookdown

Model Optimization {#model-optim}

Các thuật toán ML tự động đặt giá trị mặc định cho các hyperparameters của chúng. Những hyperparameters cần phải được thay đổi bởi người dùng để đạt được hiệu suất tối ưu trên tập dữ liệu đã cho. Việc lựa chọn các giá trị này một cách thủ công không được khuyến khích, cách tiếp cận này hiếm khi đạt được hiệu suất tối ưu. Để chứng minh tính hợp lệ của các tham số đã được chọn (= tuning), nên tối ưu hóa dữ liệu. Để điều chỉnh (tune) một thuật toán ML, bạn cần phải chỉ rõ (1) the search space, (2) the optimization algorithm (aka tuning method) and (3) an evaluation method, i.e., a resampling strategy and a performance measure.

Nói tóm lại, chương tuning minh họa cách:

Kinh nghiệm thực hành hyperparameter selection
Lựa chọn optimizing algorithm
trigger the tuning
automate tuning

Ssub-chapter cần phải có package mlr3-tuning, là một package mở rộng nhằm hỗ trợ việc điều chỉnh hyperparameter.

Feature Selection

Phần thứ 2 của chương này giải thích về feature selection. Mục đích của feature selection là để chọn các biến độc lập phù hợp nhất cho một mô hình. Feature selection có thể làm tăng tính giải thích của mô hình, tăng tốc độ fitting và cải thiện learner performance bằng việc giảm nhiễu trong dữ liệu.

Có nhiều cách tiếp cận khác nhau để xác định các features có liên quan. Trong chương feature selection này có 3 cách tiếp cận được nhấn mạnh:

Feature selection sử dụng thuật toán filter
Feature selection thông qua variable importance filters
Feature selection bằng cái gọi là wrapper methods

Cách tiếp cận thứ 4, feature selection thông qua ensemble filters, sẽ được giới thiệu trong phần tiếp. Việc thực hiện cả 4 cách tiếp cận trong mlr3 được thể hiện bằng cách sử dụng package mở rộng mlr3filters.

Nested Resampling

Để có được một ước lượng tốt của hiếu suất tổng quát và tránh được việc rò rỉ dữ liệu, cả hai quá trình outer (performance) và inner (tuning/feature selection) resampling là rất cần thiết. Các tính năng sẽ được thảo luận trong chương này là:

Các kịch bản Inner and outer resampling trong nested resampling
The execution of nested resampling
The evaluation of executed resampling iterations

Phần phụ lấy mẫu lồng nhau nested resampling sẽ giới thiệu cách thực hiện nested resampling, có tính để cả inner and outer resampling trong mlr3.

Hyperparameter Tuning {#tuning}

Điều chỉnh hyperparameter được hỗ trợ bởi package r mlr_pkg("mlr3tuning"). Trung tâm của r mlr_pkg("mlr3tuning") là R6 classes:

r ref("TuningInstance"): class này mô tả bài toán điều chỉnh và lưu các kết quả
r ref("Tuner"): class này là class cơ sở cho việc thực hiện các thuật toán điều chỉnh.

The `TuningInstance` Class {#tuning-optimization}

Phần phụ sau đây kiểm tra việc tối ưu hóa một classification tree đơn giản dựa trên tập dữ liệu r ref("mlr_tasks_pima", text = "Pima Indian Diabetes").

task = tsk("pima")
print(task)

Chúng tôi sử dụng classification tree từ r cran_pkg("rpart") và chọn một tập con của hyperparameters chúng tôi muốn điều chỉnh. Cái này gọi là không gian điều chỉnh.

learner = lrn("classif.rpart")
learner$param_set

Ở đây chúng tôi chọn 2 tham số tên là complexity cp và tiêu chí dừng lại minsplit. Vì không gian điều chỉnh cần phải giới hạn, chúng ra sẽ đặt giới hạn trên và dưới:

library(paradox)
tune_ps = ParamSet$new(list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamInt$new("minsplit", lower = 1, upper = 10)
))
tune_ps

Tiếp theo, chúng ta có thể xác định cách để đánh giá hiệu suất. Để xác định cách đánh giá chúng ta phải chọn một r ref("Resampling", text = "resampling strategy") và một r ref("Measure", text = "performance measure").

hout = rsmp("holdout")
measure = msr("classif.ce")

Cuối cùng, chúng ta phải chọn khả năng có sẵn (budget available) để giải quyết trường hợp điều chỉnh này. Cái này được thực hiện bằng lựa chọn một trong những r ref("Terminator", text = "Terminators"):

Chấm dứt sau một thời gian nhất định (r ref("TerminatorClockTime"))
Chấm dứt sau số lượng vòng lặp nhất định (r ref("TerminatorEvals"))
Chấm dứt sau khi đạt hiệu suất cho trước (r ref("TerminatorPerfReached"))
Chấm dứt khi điều chỉnh không cải thiện (r ref("TerminatorStagnation"))
Kết hợp các cái trên trong theo kiểu ALL hoặc ANY sử dụng r ref("TerminatorCombo")

Đối với phần giới thiệu ngắn này, chúng tôi cấp ngân sách 20 đánh giá và sau đó kết hợp mọi thứ vào thành một r ref("TuningInstance"):

library(mlr3tuning)

evals20 = term("evals", n_evals = 20)

instance = TuningInstance$new(
  task = task,
  learner = learner,
  resampling = hout,
  measures = measure,
  param_set = tune_ps,
  terminator = evals20
)
print(instance)

Để điều chỉnh, chúng ta vẫn cần phải chọn cách tối ưu hóa sẽ diễn ra. Hay nói cách khác, chúng ta cần chọn optimization algorithm qua r ref("Tuner") class.

The `Tuner` Class

Các thuật toán đang được triển khai trong r mlr_pkg("mlr3tuning"):

Tìm kiếm lưới (r ref("TunerGridSearch"))
Tìm kiếm ngẫu nhiên (r ref("TunerRandomSearch")) [@bergstra2012]
Generalized Simulated Annealing (r ref("TunerGenSA"))

Trong ví dụ này, chúng ta sẽ sử dụng tìm kiếm lưới đơn giản với lưới resolution 5:

tuner = tnr("grid_search", resolution = 5)

Vì chúng ta chỉ có tham số dạng numeric, r ref("TunerGridSearch") sẽ tạo một lưới các bước có kích cỡ bằng nhau giữa giới hạn trên và dưới tương ứng. Vì vậy, chúng ta có 2 hyperparameters với một resolution của 5, lưới 2 chiều gồm các cấu hình $5^2 = 25$. Mỗi cấu hình đóng vai trò cài đặt hyperparameters cho classification tree và kích hoạt (triggers) 3-fold cross validation trên task. Tất cả cấu hình sẽ được kiểm tra bởi tuner (theo thứ tự ngẫu nhiên), cho tới khi tất cả các cấu hình được đánh giá hoặc r ref("Terminator") báo đã hết ngân sách.

Triggering the Tuning {#tuning-triggering}

Để bắt đầu điều chỉnh, chúng ta chỉ cần chuyển r ref("TuningInstance") vào phương thức $tune() của r ref("Tuner") đã được khởi tạo. Tuner tiến hành như sau:

r ref("Tuner") đề xuất ít nhất một cấu hình hyperparameter r ref("Tuner") và có thể đề xuất nhiều điểm để tăng khả năng thực hiện song song, cái này có thể được kiểm soát qua cài đặt batch_size).
Với mỗi cấu hình, một r ref("Learner") được fitted dựa trên r ref("Task") sử dụng r ref("Resampling") đã cho. Các kết quả được kết hợp với các kết quả khác từ vòng lặp trước đến một r ref("BenchmarkResult").
r ref("Terminator") được truy vấn nếu ngân sách đã cạn kiệt. Nếu ngân sách vẫn còn, khởi động lại bước 1 cho tới khi cạn kiệt.
Xác định cấu hình với hiệu suất quan sát tốt nhất.
Trả về một list được đặt tên với cài đặt hyperparameter ("values") và hiệu suất đo lường tương ứng ("performance").

result = tuner$tune(instance)
print(result)

Chúng ta có thể kiểm tra tất cả các resamplings đã được thực hiện bằng phương thức $archive() của r ref("TuningInstance"). Ở đây chúng tối chỉ xuất các giá trị của hiệu suất và hyperparameters:

instance$archive(unnest = "params")[, c("cp", "minsplit", "classif.ce")]

Tóm lại, tìm kiếm lưới đã đánh giá 20/25 cấu hình khác nhau của lưới theo thứ tự ngẫu nhiên trước khi r ref("Terminator") dừng điều chỉnh.

Bây giờ, các hyperparameters đã được tối ưu có thể dùng với r ref("Learner") đã được tạo từ trước, đặt hyperparameters đã được chọn và train nó với tập dữ liệu đầy đủ.

learner$param_set$values = instance$result$params
learner$train(task)

The trained model could now be used to make a prediction on external data. Note that predicting on observations present in the task, is statistically bias and should be avoided. The model has already seen these observations during the tuning process. Hence, the resulting performance measure would be over-optimistic. Instead, to get unbiased performance estimates for the current task, nested resampling is required.

Automating the Tuning {#autotuner}

The r ref("AutoTuner") wraps a learner and augments it with an automatic tuning for a given set of hyperparameters. Because the r ref("AutoTuner") itself inherits from the r ref("Learner") base class, it can be used like any other learner. Analogously to the previous subsection, a new classification tree learner is created. This classification tree learner automatically tunes the parameters cp and minsplit using an inner resampling (holdout). We create a terminator which allows 10 evaluations, and use a simple random search as tuning algorithm:

library(paradox)
library(mlr3tuning)

learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measures = msr("classif.ce")
tune_ps = ParamSet$new(list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamInt$new("minsplit", lower = 1, upper = 10)
))
terminator = term("evals", n_evals = 10)
tuner = tnr("random_search")

at = AutoTuner$new(
  learner = learner,
  resampling = resampling,
  measures = measures,
  tune_ps = tune_ps,
  terminator = terminator,
  tuner = tuner
)
at

We can now use the learner like any other learner, calling the $train() and $predict() method. This time however, we pass it to r ref("benchmark()") to compare the tuner to a classification tree without tuning. This way, the r ref("AutoTuner") will do its resampling for tuning on the training set of the respective split of the outer resampling. The learner then predicts using the test set of the outer resampling. This yields unbiased performance measures, as the observations in the test set have not been used during tuning or fitting of the respective learner. This is called nested resampling.

To compare the tuned learner with the learner using its default, we can use r ref("benchmark()"):

grid = benchmark_grid(
  task = tsk("pima"),
  learner = list(at, lrn("classif.rpart")),
  resampling = rsmp("cv", folds = 3)
)
bmr = benchmark(grid)
bmr$aggregate(measures)

Note that we do not expect any differences compared to the non-tuned approach for multiple reasons:

the task is too easy
the task is rather small, and thus prone to overfitting
the tuning budget (10 evaluations) is small
r cran_pkg("rpart") does not benefit that much from tuning

Feature Selection / Filtering {#fs}

Often, data sets include a large number of features. The technique of extracting a subset of relevant features is called "feature selection". The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up the learning process and improve the learner performance. Different approaches exist to identify the relevant features. In the literature two distinct approaches are emphasized: One is called Filtering and the other approach is often referred to as feature subset selection or wrapper methods.

What are the differences [@chandrashekar2014]?

Filtering: An external algorithm computes a rank of the variables (e.g. based on the correlation to the response). Then, features are subsetted by a certain criteria, e.g. an absolute number or a percentage of the number of variables. The selected features will then be used to fit a model (with optional hyperparameters selected by tuning). This calculation is usually cheaper than “feature subset selection” in terms of computation time.
Wrapper Methods: Here, no ranking of features is done. Features are selected by a (random) subset of the data. Then, we fit a model and subseqeuently assess the performance. This is done for a lot of feature combinations in a cross-validation (CV) setting and the best combination is reported. This method is very computational intense as a lot of models are fitted. Also, strictly speaking all these models would need to be tuned before the performance is estimated. This would require an additional nested level in a CV setting. After undertaken all of these steps, the selected subset of features is again fitted (with optional hyperparameters selected by tuning).

There is also a third approach which can be attributed to the "filter" family: The embedded feature-selection methods of some r ref("Learner"). Read more about how to use these in section embedded feature-selection methods.

Ensemble filters built upon the idea of stacking single filter methods. These are not yet implemented.

All functionality that is related to feature selection is implemented via the extension package r gh_pkg("mlr-org/mlr3filters").

Filters {#fs-filter}

Filter methods assign an importance value to each feature. Based on these values the features can be ranked. Thereafter, we are able to select a feature subset. There is a list of all implemented filter methods in the Appendix.

Calculating filter values {#fs-calc}

Currently, only classification and regression tasks are supported.

The first step it to create a new R object using the class of the desired filter method. Each object of class Filter has a .$calculate() method which calculates the filter values and ranks them in a descending order.

library(mlr3filters)
filter = FilterJMIM$new()

task = tsk("iris")
filter$calculate(task)

as.data.table(filter)

Some filters support changing specific hyperparameters. This is done similar to setting hyperparameters of a r ref("Learner") using .$param_set$values:

filter_cor = FilterCorrelation$new()
filter_cor$param_set

# change parameter 'method'
filter_cor$param_set$values = list(method = "spearman")
filter_cor$param_set

Rather than taking the "long" R6 way to create a filter, there is also a built-in shorthand notation for filter creation:

filter = flt("cmim")
filter

Variable Importance Filters {#fs-var-imp-filters}

All r ref("Learner") with the property "importance" come with integrated feature selection methods.

You can find a list of all learners with this property in the Appendix.

For some learners the desired filter method needs to be set during learner creation. For example, learner classif.ranger (in the package r mlr_pkg("mlr3learners")) comes with multiple integrated methods. See the help page of r ref("ranger::ranger"). To use method "impurity", you need to set the filter method during construction.

library(mlr3learners)
lrn = lrn("classif.ranger", importance = "impurity")

Now you can use the r ref("mlr3filters::FilterImportance") class for algorithm-embedded methods to filter a r ref("Task").

library(mlr3learners)

task = tsk("iris")
filter = flt("importance", learner = lrn)
filter$calculate(task)
head(as.data.table(filter), 3)

Ensemble Methods {#fs-ensemble}

```{block, type='warning'} Work in progress :)

### Wrapper Methods {#fs-wrapper}

```{block, type='warning'}
Work in progress :) - via package _mlr3fswrap_

Nested Resampling {#nested-resampling}

In order to obtain unbiased performance estimates for learners, all parts of the model building (preprocessing and model selection steps) should be included in the resampling, i.e., repeated for every pair of training/test data. For steps that themselves require resampling like hyperparameter tuning or feature-selection (via the wrapper approach) this results in two nested resampling loops.

knitr::include_graphics("images/nested_resampling.png")

The graphic above illustrates nested resampling for parameter tuning with 3-fold cross-validation in the outer and 4-fold cross-validation in the inner loop.

In the outer resampling loop, we have three pairs of training/test sets. On each of these outer training sets parameter tuning is done, thereby executing the inner resampling loop. This way, we get one set of selected hyperparameters for each outer training set. Then the learner is fitted on each outer training set using the corresponding selected hyperparameters. Following, we can evaulate the performance of the learner on the outer test sets.

In r gh_pkg("mlr-org/mlr3"), you can get nested resampling for free without programming any looping by using the r ref("mlr3tuning::AutoTuner") class. This works as follows:

Generate a wrapped Learner via class r ref("mlr3tuning::AutoTuner") or mlr3filters::AutoSelect (not yet implemented).
Specify all required settings - see section "Automating the Tuning" for help.
Call function r ref("resample()") or r ref("benchmark()") with the created r ref("Learner").

You can freely combine different inner and outer resampling strategies.

A common setup is prediction and performance evaluation on a fixed outer test set. This can be achieved by passing the r ref("Resampling") strategy (rsmp("holdout")) as the outer resampling instance to either r ref("resample()") or r ref("benchmark()").

The inner resampling strategy could be a cross-validation one (rsmp("cv")) as the sizes of the outer training sets might differ. Per default, the inner resample description is instantiated once for every outer training set.

Note that nested resampling is computationally expensive. For this reason we use relatively small search spaces and a low number of resampling iterations in the examples shown below. In practice, you normally have to increase both. As this is computationally intensive you might want to have a look at the section on Parallelization.

Execution {#nested-resamp-exec}

To optimize hyperparameters or conduct feature selection in a nested resampling you need to create learners using either:

the r ref("AutoTuner") class, or
the mlr3filters::AutoSelect class (not yet implemented)

We use the example from section "Automating the Tuning" and pipe the resulting learner into a r ref("resample()") call.

library(mlr3tuning)
task = tsk("iris")
learner = lrn("classif.rpart")
resampling = rsmp("holdout")
measures = msr("classif.ce")
param_set = paradox::ParamSet$new(
  params = list(paradox::ParamDbl$new("cp", lower = 0.001, upper = 0.1)))
terminator = term("evals", n_evals = 5)
tuner = tnr("grid_search", resolution = 10)

at = AutoTuner$new(learner, resampling, measures = measures,
  param_set, terminator, tuner = tuner)

Now construct the r ref("resample()") call:

resampling_outer = rsmp("cv", folds = 3)
rr = resample(task = task, learner = at, resampling = resampling_outer)

Evaluation {#nested-resamp-eval}

With the created r ref("ResampleResult") we can now inspect the executed resampling iterations more closely. See also section Resampling for more detailed information about r ref("ResampleResult") objects.

For example, we can query the aggregated performance result:

rr$aggregate()

Check for any errors in the folds during execution (if there is not output, warnings or errors recorded, this is an empty data.table():

rr$errors

Or take a look at the confusion matrix of the joined predictions:

rr$prediction()$confusion

nguyenngocbinh/mlr3_book_vi documentation built on Jan. 23, 2020, 12:28 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nguyenngocbinh/mlr3_book_vi
R package to build the mlr3 book via bookdown

In nguyenngocbinh/mlr3_book_vi: R package to build the mlr3 book via bookdown

Model Optimization {#model-optim}

Hyperparameter Tuning {#tuning}

The `TuningInstance` Class {#tuning-optimization}

The `Tuner` Class

Triggering the Tuning {#tuning-triggering}

Automating the Tuning {#autotuner}

Feature Selection / Filtering {#fs}

Filters {#fs-filter}

Calculating filter values {#fs-calc}

Variable Importance Filters {#fs-var-imp-filters}

Ensemble Methods {#fs-ensemble}

Nested Resampling {#nested-resampling}

Execution {#nested-resamp-exec}

Evaluation {#nested-resamp-eval}

R Package Documentation

Browse R Packages

We want your feedback!

nguyenngocbinh/mlr3_book_vi R package to build the mlr3 book via bookdown

In nguyenngocbinh/mlr3_book_vi: R package to build the mlr3 book via bookdown

Model Optimization {#model-optim}

Hyperparameter Tuning {#tuning}

The TuningInstance Class {#tuning-optimization}

The Tuner Class

Triggering the Tuning {#tuning-triggering}

Automating the Tuning {#autotuner}

Feature Selection / Filtering {#fs}

Filters {#fs-filter}

Calculating filter values {#fs-calc}

Variable Importance Filters {#fs-var-imp-filters}

Ensemble Methods {#fs-ensemble}

Nested Resampling {#nested-resampling}

Execution {#nested-resamp-exec}

Evaluation {#nested-resamp-eval}

R Package Documentation

Browse R Packages

We want your feedback!

nguyenngocbinh/mlr3_book_vi
R package to build the mlr3 book via bookdown

The `TuningInstance` Class {#tuning-optimization}

The `Tuner` Class