View source: R/ensemble_fselect.R
ensemble_fselect | R Documentation |
Ensemble feature selection using multiple learners. The ensemble feature selection method is designed to identify the most predictive features from a given dataset by leveraging multiple machine learning models and resampling techniques. Returns an EnsembleFSResult.
ensemble_fselect(
fselector,
task,
learners,
init_resampling,
inner_resampling,
inner_measure,
measure,
terminator,
callbacks = NULL,
store_benchmark_result = TRUE,
store_models = FALSE
)
fselector |
(FSelector) |
task |
(mlr3::Task) |
learners |
(list of mlr3::Learner) |
init_resampling |
(mlr3::Resampling) |
inner_resampling |
(mlr3::Resampling) |
inner_measure |
(mlr3::Measure) |
measure |
(mlr3::Measure) |
terminator |
(bbotk::Terminator) |
callbacks |
(Named list of lists of CallbackBatchFSelect) |
store_benchmark_result |
( |
store_models |
( |
The method begins by applying an initial resampling technique specified by the user, to create multiple subsamples from the original dataset (train/test splits). This resampling process helps in generating diverse subsets of data for robust feature selection.
For each subsample (train set) generated in the previous step, the method performs wrapped-based feature selection (auto_fselector) using each provided learner, the given inner resampling method, inner performance measure and optimization algorithm. This process generates 1) the best feature subset and 2) a final trained model using these best features, for each combination of subsample and learner. The final models are then scored on their ability to predict on the resampled test sets.
Results are stored in an EnsembleFSResult.
The result object also includes the performance scores calculated during the inner resampling of the training sets, using models with the best feature subsets.
These scores are stored in a column named {measure_id}_inner
.
an EnsembleFSResult object.
The active measure of performance is the one applied to the test sets.
This is preferred, as inner resampling scores on the training sets are likely to be overestimated when using the final models.
Users can change the active measure by using the set_active_measure()
method of the EnsembleFSResult.
Saeys, Yvan, Abeel, Thomas, Van De Peer, Yves (2008). “Robust feature selection using ensemble feature selection techniques.” Machine Learning and Knowledge Discovery in Databases, 5212 LNAI, 313–325. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-540-87481-2_21")}.
Abeel, Thomas, Helleputte, Thibault, Van de Peer, Yves, Dupont, Pierre, Saeys, Yvan (2010). “Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.” Bioinformatics, 26, 392–398. ISSN 1367-4803, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/BIOINFORMATICS/BTP630")}.
Pes, Barbara (2020). “Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains.” Neural Computing and Applications, 32(10), 5951–5973. ISSN 14333058, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00521-019-04082-3")}.
efsr = ensemble_fselect(
fselector = fs("random_search"),
task = tsk("sonar"),
learners = lrns(c("classif.rpart", "classif.featureless")),
init_resampling = rsmp("subsampling", repeats = 2),
inner_resampling = rsmp("cv", folds = 3),
inner_measure = msr("classif.ce"),
measure = msr("classif.acc"),
terminator = trm("evals", n_evals = 10)
)
efsr
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.