Feature filtering using a
mlr3filters::Filter object, see the
Filter can only operate on a subset of columns based on column type, then only these features are considered and filtered.
frac will count for the features of the type that the
Filter can operate on;
this means e.g. that setting
nfeat to 0 will only remove features of the type that the
Filter can work with.
R6Class object inheriting from
PipeOpFilter$new(filter, id = filter$id, param_vals = list())
Filter used for feature filtering.
This argument is always cloned; to access the
PipeOpFilter by-reference, use
Identifier of the resulting object, defaulting to the
id of the
Filter being used.
param_vals :: named
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default
Input and output channels are inherited from
The output is the input
Task with features removed that were filtered out.
$state is a named
list with the
$state elements inherited from
PipeOpTaskPreproc, as well as:
scores :: named
Scores calculated for all features of the training
Task which are being used
as cutoff for feature filtering. If
nfeat is given, the underlying
Filter may choose to not calculate scores for
all features that are given. This only includes features on which the
Filter can operate; e.g.
Filter can only operate on numeric features, then scores for factorial features will not be given.
Names of features that are being kept. Features of types that the
Filter can not operate on are always being kept.
The parameters are the parameters inherited from the
PipeOpTaskPreproc, as well as the parameters of the
used by this object. Besides, parameters introduced are:
Number of features to select. Mutually exclusive with
Fraction of features to keep. Mutually exclusive with
Minimum value of filter heuristic for which to keep features. Mutually exclusive with
If this parameter is set, a random permutation of each feature is added to the task before applying the filter. All features selected before the
permuted-th permuted features is selected
are kept. This is similar to the approach in Wu (2007) and Thomas (2017).
Mutually exclusive with
Note that at least one of
filter.permuted must be given.
This does not use the
$.select_cols feature of
PipeOpTaskPreproc to select only features compatible with the
instead the whole
Task is used by
private$.get_state() and subset internally.
Fields inherited from
PipeOpTaskPreproc, as well as:
Filter that is being used for feature filtering. Do not use this slot to get to the feature filtering scores
after training; instead, use
Methods inherited from
Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235–243. doi: 10.1198/016214506000000843.
Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1–8. doi: 10.1155/2017/1421409.
library("mlr3") library("mlr3filters") # setup PipeOpFilter to keep the 5 most important # features of the spam task w.r.t. their AUC task = tsk("spam") filter = flt("auc") po = po("filter", filter = filter) po$param_set po$param_set$values$filter.nfeat = 5 # filter the task filtered_task = po$train(list(task))[] # filtered task + extracted AUC scores filtered_task$feature_names head(po$state$scores, 10) # feature selection embedded in a 3-fold cross validation # keep 30% of features based on their AUC score task = tsk("spam") gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>% po("learner", lrn("classif.rpart")) learner = GraphLearner$new(gr) rr = resample(task, learner, rsmp("holdout"), store_models = TRUE) rr$learners[]$model$auc$scores
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.