lgr::get_logger("mlr3")$set_threshold("warn") lgr::get_logger("bbotk")$set_threshold("warn") set.seed(0) options( datatable.print.nrows = 10, datatable.print.class = FALSE, datatable.print.keys = FALSE, datatable.print.trunc.cols = TRUE, width = 100) # mute load messages library("mlr3fselect")
Package website: release | dev
mlr3fselect is the feature selection package of the mlr3 ecosystem. It selects the optimal feature set for any mlr3 learner. The package works with several optimization algorithms e.g. Random Search, Recursive Feature Elimination, and Genetic Search. Moreover, it can automatically optimize learners and estimate the performance of optimized feature sets with nested resampling. The package is built on the optimization framework bbotk.
There are several section about feature selection in the mlr3book.
The gallery features a collection of case studies and demos about optimization.
The cheatsheet summarizes the most important functions of mlr3fselect.
Install the last release from CRAN:
install.packages("mlr3fselect")
Install the development version from GitHub:
remotes::install_github("mlr-org/mlr3fselect")
We run a feature selection for a support vector machine on the Spam data set.
library("mlr3verse") tsk("spam")
We construct an instance with the fsi()
function.
The instance describes the optimization problem.
instance = fsi( task = tsk("spam"), learner = lrn("classif.svm", type = "C-classification"), resampling = rsmp("cv", folds = 3), measures = msr("classif.ce"), terminator = trm("evals", n_evals = 20) ) instance
We select a simple random search as the optimization algorithm.
fselector = fs("random_search", batch_size = 5) fselector
To start the feature selection, we simply pass the instance to the fselector.
fselector$optimize(instance)
The fselector writes the best hyperparameter configuration to the instance.
instance$result_feature_set
And the corresponding measured performance.
instance$result_y
The archive contains all evaluated hyperparameter configurations.
as.data.table(instance$archive)
We fit a final model with the optimized feature set to make predictions on new data.
task = tsk("spam") learner = lrn("classif.svm", type = "C-classification") task$select(instance$result_feature_set) learner$train(task)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.