mlr_pipeops_filter: Feature Filtering
In mlr-org/mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'

mlr_pipeops_filter

R Documentation

Feature Filtering

Description

Feature filtering using a mlr3filters::Filter object, see the mlr3filters package.

If a Filter can only operate on a subset of columns based on column type, then only these features are considered and filtered. nfeat and frac will count for the features of the type that the Filter can operate on; this means e.g. that setting nfeat to 0 will only remove features of the type that the Filter can work with.

Format

R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Construction

PipeOpFilter$new(filter, id = filter$id, param_vals = list())

filter :: Filter
Filter used for feature filtering. This argument is always cloned; to access the Filter inside PipeOpFilter by-reference, use ⁠$filter⁠.
id :: character(1) Identifier of the resulting object, defaulting to the id of the Filter being used.
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list().

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output is the input Task with features removed that were filtered out.

State

The ⁠$state⁠ is a named list with the ⁠$state⁠ elements inherited from PipeOpTaskPreproc, as well as:

scores :: named numeric
Scores calculated for all features of the training Task which are being used as cutoff for feature filtering. If frac or nfeat is given, the underlying Filter may choose to not calculate scores for all features that are given. This only includes features on which the Filter can operate; e.g. if the Filter can only operate on numeric features, then scores for factorial features will not be given.
features :: character
Names of features that are being kept. Features of types that the Filter can not operate on are always being kept.

Parameters

The parameters are the parameters inherited from the PipeOpTaskPreproc, as well as the parameters of the Filter used by this object. Besides, parameters introduced are:

filter.nfeat :: numeric(1)
Number of features to select. Mutually exclusive with frac, cutoff, and permuted.
filter.frac :: numeric(1)
Fraction of features to keep. Mutually exclusive with nfeat, cutoff, and permuted.
filter.cutoff :: numeric(1)
Minimum value of filter heuristic for which to keep features. Mutually exclusive with nfeat, frac, and permuted.
filter.permuted :: integer(1)
If this parameter is set, a random permutation of each feature is added to the task before applying the filter. All features selected before the permuted-th permuted features is selected are kept. This is similar to the approach in Wu (2007) and Thomas (2017). Mutually exclusive with nfeat, frac, and cutoff.

Note that at least one of filter.nfeat, filter.frac, filter.cutoff, and filter.permuted must be given.

Internals

This does not use the ⁠$.select_cols⁠ feature of PipeOpTaskPreproc to select only features compatible with the Filter; instead the whole Task is used by private$.get_state() and subset internally.

Fields

Fields inherited from PipeOp, as well as:

filter :: Filter
Filter that is being used for feature filtering. Do not use this slot to get to the feature filtering scores after training; instead, use ⁠$state$scores⁠. Read-only.

Methods

Methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

References

Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235–243. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1198/016214506000000843")}.

Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1–8. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1155/2017/1421409")}.

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples


library("mlr3")
library("mlr3filters")


# setup PipeOpFilter to keep the 5 most important
# features of the spam task w.r.t. their AUC
task = tsk("spam")
filter = flt("auc")
po = po("filter", filter = filter)
po$param_set
po$param_set$values$filter.nfeat = 5

# filter the task
filtered_task = po$train(list(task))[[1]]

# filtered task + extracted AUC scores
filtered_task$feature_names
head(po$state$scores, 10)

# feature selection embedded in a 3-fold cross validation
# keep 30% of features based on their AUC score
task = tsk("spam")
gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
  po("learner", lrn("classif.rpart"))
learner = GraphLearner$new(gr)
rr = resample(task, learner, rsmp("holdout"), store_models = TRUE)
rr$learners[[1]]$model$auc$scores

mlr-org/mlr3pipelines documentation built on April 5, 2025, 2:56 p.m.

mlr-org/mlr3pipelines index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlr-org/mlr3pipelines
Preprocessing Operators and Pipelines for 'mlr3'

mlr_pipeops_filter: Feature Filtering
In mlr-org/mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'

Feature Filtering

Description

Format

Construction

Input and Output Channels

State

Parameters

Internals

Fields

Methods

References

See Also

Examples

Related to mlr_pipeops_filter in mlr-org/mlr3pipelines...

R Package Documentation

Browse R Packages

We want your feedback!

mlr-org/mlr3pipelines Preprocessing Operators and Pipelines for 'mlr3'

mlr_pipeops_filter: Feature Filtering In mlr-org/mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'

Feature Filtering

Description

Format

Construction

Input and Output Channels

State

Parameters

Internals

Fields

Methods

References

See Also

Examples

Related to mlr_pipeops_filter in mlr-org/mlr3pipelines...

R Package Documentation

Browse R Packages

We want your feedback!

mlr-org/mlr3pipelines
Preprocessing Operators and Pipelines for 'mlr3'

mlr_pipeops_filter: Feature Filtering
In mlr-org/mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'