mlr_pipeops_filter | R Documentation |
Feature filtering using a mlr3filters::Filter
object, see the
mlr3filters package.
If a Filter
can only operate on a subset of columns based on column type, then only these features are considered and filtered.
nfeat
and frac
will count for the features of the type that the Filter
can operate on;
this means e.g. that setting nfeat
to 0 will only remove features of the type that the Filter
can work with.
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
PipeOpFilter$new(filter, id = filter$id, param_vals = list())
filter
:: Filter
Filter
used for feature filtering.
This argument is always cloned; to access the Filter
inside PipeOpFilter
by-reference, use $filter
.
id
:: character(1)
Identifier of the resulting object, defaulting to the id
of the Filter
being used.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default list()
.
Input and output channels are inherited from PipeOpTaskPreproc
.
The output is the input Task
with features removed that were filtered out.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
scores
:: named numeric
Scores calculated for all features of the training Task
which are being used
as cutoff for feature filtering. If frac
or nfeat
is given, the underlying Filter
may choose to not calculate scores for
all features that are given. This only includes features on which the Filter
can operate; e.g.
if the Filter
can only operate on numeric features, then scores for factorial features will not be given.
features
:: character
Names of features that are being kept. Features of types that the Filter
can not operate on are always being kept.
The parameters are the parameters inherited from the PipeOpTaskPreproc
, as well as the parameters of the Filter
used by this object. Besides, parameters introduced are:
filter.nfeat
:: numeric(1)
Number of features to select.
Mutually exclusive with frac
, cutoff
, and permuted
.
filter.frac
:: numeric(1)
Fraction of features to keep.
Mutually exclusive with nfeat
, cutoff
, and permuted
.
filter.cutoff
:: numeric(1)
Minimum value of filter heuristic for which to keep features.
Mutually exclusive with nfeat
, frac
, and permuted
.
filter.permuted
:: integer(1)
If this parameter is set, a random permutation of each feature is added to the task before
applying the filter. All features selected before the permuted
-th permuted features is selected
are kept. This is similar to the approach in Wu (2007) and Thomas (2017).
Mutually exclusive with nfeat
, frac
, and cutoff
.
Note that at least one of filter.nfeat
, filter.frac
, filter.cutoff
, and filter.permuted
must be given.
This does not use the $.select_cols
feature of PipeOpTaskPreproc
to select only features compatible with the Filter
;
instead the whole Task
is used by private$.get_state()
and subset internally.
Fields inherited from PipeOpTaskPreproc
, as well as:
filter
:: Filter
Filter
that is being used for feature filtering. Do not use this slot to get to the feature filtering scores
after training; instead, use $state$scores
. Read-only.
Methods inherited from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Wu Y, Boos DD, Stefanski LA (2007). “Controlling Variable Selection by the Addition of Pseudovariables.” Journal of the American Statistical Association, 102(477), 235–243. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1198/016214506000000843")}.
Thomas J, Hepp T, Mayr A, Bischl B (2017). “Probing for Sparse and Fast Variable Selection with Model-Based Boosting.” Computational and Mathematical Methods in Medicine, 2017, 1–8. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1155/2017/1421409")}.
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
library("mlr3")
library("mlr3filters")
# setup PipeOpFilter to keep the 5 most important
# features of the spam task w.r.t. their AUC
task = tsk("spam")
filter = flt("auc")
po = po("filter", filter = filter)
po$param_set
po$param_set$values$filter.nfeat = 5
# filter the task
filtered_task = po$train(list(task))[[1]]
# filtered task + extracted AUC scores
filtered_task$feature_names
head(po$state$scores, 10)
# feature selection embedded in a 3-fold cross validation
# keep 30% of features based on their AUC score
task = tsk("spam")
gr = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
po("learner", lrn("classif.rpart"))
learner = GraphLearner$new(gr)
rr = resample(task, learner, rsmp("holdout"), store_models = TRUE)
rr$learners[[1]]$model$auc$scores
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.