PipeOpTaskPreproc | R Documentation |
Base class for handling most "preprocessing" operations. These
are operations that have exactly one Task
input and one Task
output,
and expect the column layout of these Task
s during input and output
to be the same.
Prediction-behavior of preprocessing operations should always be independent for each row in the input-Task
.
This means that the prediction-operation of preprocessing-PipeOp
s should commute with rbind()
: Running prediction
on an n
-row Task
should result in the same result as rbind()
-ing the prediction-result from n
1-row Task
s with the same content. In the large majority of cases, the number and order of rows
should also not be changed during prediction.
Users must implement private$.train_task()
and private$.predict_task()
, which have a Task
input and should return that Task
. The Task
should, if possible, be
manipulated in-place, and should not be cloned.
Alternatively, the private$.train_dt()
and private$.predict_dt()
functions can be implemented, which operate on
data.table
objects instead. This should generally only be done if all
data is in some way altered (e.g. PCA changing all columns to principal components) and not if only
a few columns are added or removed (e.g. feature selection) because this should be done at the Task
-level
with private$.train_task()
. The private$.select_cols()
function can be overloaded for private$.train_dt()
and private$.predict_dt()
to operate only on subsets of the Task
's data, e.g. only on numerical columns.
If the can_subset_cols
argument of the constructor is TRUE
(the default), then the hyperparameter affect_columns
is added, which can limit the columns of the Task
that is modified by the PipeOpTaskPreproc
using a Selector
function. Note this functionality is entirely independent of the private$.select_cols()
functionality.
PipeOpTaskPreproc
is useful for operations that behave differently during training and prediction. For operations
that perform essentially the same operation and only need to perform extra work to build a $state
during training,
the PipeOpTaskPreprocSimple
class can be used instead.
Abstract R6Class
inheriting from PipeOp
.
PipeOpTaskPreproc$new(id, param_set = ps(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task", tags = NULL, feature_types = mlr_reflections$task_feature_types)
id
:: character(1)
Identifier of resulting object. See $id
slot of PipeOp
.
param_set
:: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize()
.
param_vals
:: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set
. The
subclass should have its own param_vals
parameter and pass it on to super$initialize()
. Default list()
.
can_subset_cols
:: logical(1)
Whether the affect_columns
parameter should be added which lets the user limit the columns that are
modified by the PipeOpTaskPreproc
. This should generally be FALSE
if the operation adds or removes
rows from the Task
, and TRUE
otherwise. Default is TRUE
.
packages :: character
Set of all required packages for the PipeOp
's private$.train()
and private$.predict()
methods. See $packages
slot.
Default is character(0)
.
task_type
:: character(1)
The class of Task
that should be accepted as input and will be returned as output. This
should generally be a character(1)
identifying a type of Task
, e.g. "Task"
, "TaskClassif"
or
"TaskRegr"
(or another subclass introduced by other packages). Default is "Task"
.
tags :: character
| NULL
Tags of the resulting PipeOp
. This is added to the tag "data transform"
. Default NULL
.
feature_types
:: character
Feature types affected by the PipeOp
. See private$.select_cols()
for more information.
Defaults to all available feature types.
PipeOpTaskPreproc
has one input channel named "input"
, taking a Task
, or a subclass of
Task
if the task_type
construction argument is given as such; both during training and prediction.
PipeOpTaskPreproc
has one output channel named "output"
, producing a Task
, or a subclass;
the Task
type is the same as for input; both during training and prediction.
The output Task
is the modified input Task
according to the overloaded
private$.train_task()
/private$.predict_taks()
or private$.train_dt()
/private$.predict_dt()
functions.
The $state
is a named list
; besides members added by inheriting classes, the members are:
affect_cols
:: character
Names of features being selected by the affect_columns
parameter, if present; names of all present features otherwise.
intasklayout
:: data.table
Copy of the training Task
's $feature_types
slot. This is used during prediction to ensure that
the prediction Task
has the same features, feature layout, and feature types as during training.
outtasklayout
:: data.table
Copy of the trained Task
's $feature_types
slot. This is used during prediction to ensure that
the Task
resulting from the prediction operation has the same features, feature layout, and feature types as after training.
dt_columns
:: character
Names of features selected by the private$.select_cols()
call during training. This is only present if the private$.train_dt()
functionality is used,
and not present if the private$.train_task()
function is overloaded instead.
feature_types
:: character
Feature types affected by the PipeOp
. See private$.select_cols()
for more information.
affect_columns
:: function
| Selector
| NULL
What columns the PipeOpTaskPreproc
should operate on. This parameter is only present if the constructor is called with
the can_subset_cols
argument set to TRUE
(the default).
The parameter must be a Selector
function, which takes a Task
as argument and returns a character
of features to use.
See Selector
for example functions. Defaults to NULL
, which selects all features.
PipeOpTaskPreproc
is an abstract class inheriting from PipeOp
. It implements the private$.train()
and
$.predict()
functions. These functions perform checks and go on to call private$.train_task()
and private$.predict_task()
.
A subclass of PipeOpTaskPreproc
may implement these functions, or implement private$.train_dt()
and private$.predict_dt()
instead.
This works by having the default implementations of private$.train_task()
and private$.predict_task()
call private$.train_dt()
and private$.predict_dt()
,
respectively.
The affect_columns
functionality works by unsetting columns by removing their "col_role" before
processing, and adding them afterwards by setting the col_role to "feature"
.
Fields inherited from PipeOp
.
Methods inherited from PipeOp
, as well as:
.train_task
(Task
) -> Task
Called by the PipeOpTaskPreproc
's implementation of private$.train()
. Takes a single Task
as input
and modifies it (ideally in-place without cloning) while storing information in the $state
slot. Note that unlike
$.train()
, the argument is not a list but a singular Task
, and the return object is also not a list but
a singular Task
. Also, contrary to private$.train()
, the $state
being generated must be a list
, which
the PipeOpTaskPreproc
will add additional slots to (see Section State). Care should be taken to avoid name collisions between
$state
elements added by private$.train_task()
and PipeOpTaskPreproc
.
By default this function calls the private$.train_dt()
function, but it can be overloaded to perform operations on the Task
directly.
.predict_task
(Task
) -> Task
Called by the PipeOpTaskPreproc
's implementation of $.predict()
. Takes a single Task
as input
and modifies it (ideally in-place without cloning) while using information in the $state
slot. Works analogously to
private$.train_task()
. If private$.predict_task()
should only be overloaded if private$.train_task()
is overloaded (i.e. private$.train_dt()
is not used).
.train_dt(dt, levels, target)
(data.table
, named list
, any
) -> data.table
| data.frame
| matrix
Train PipeOpTaskPreproc
on dt
, transform it and store a state in $state
. A transformed object must be returned
that can be converted to a data.table
using as.data.table
. dt
does not need to be copied deliberately, it
is possible and encouraged to change it in-place.
The levels
argument is a named list of factor levels for factorial or character features.
If the input Task
inherits from TaskSupervised
, the target
argument
contains the $truth()
information of the training Task
; its type depends on the Task
type being trained on.
This method can be overloaded when inheriting from PipeOpTaskPreproc
, together with private$.predict_dt()
and optionally
private$.select_cols()
; alternatively, private$.train_task()
and private$.predict_task()
can be overloaded.
.predict_dt(dt, levels)
(data.table
, named list
) -> data.table
| data.frame
| matrix
Predict on new data in dt
, possibly using the stored $state
. A transformed object must be returned
that can be converted to a data.table
using as.data.table
. dt
does not need to be copied deliberately, it
is possible and encouraged to change it in-place.
The levels
argument is a named list of factor levels for factorial or character features.
This method can be overloaded when inheriting PipeOpTaskPreproc
, together with private$.train_dt()
and optionally
private$.select_cols()
; alternatively, private$.train_task()
and private$.predict_task()
can be overloaded.
.select_cols(task)
(Task
) -> character
Selects which columns the PipeOp
operates on, if private$.train_dt()
and private$.predict_dt()
are overloaded. This function
is not called if private$.train_task()
and private$.predict_task()
are overloaded. In contrast to
the affect_columns
parameter. private$.select_cols()
is for the inheriting class to determine which columns
the operator should function on, e.g. based on feature type, while affect_columns
is a way for the user
to limit the columns that a PipeOpTaskPreproc
should operate on.
This method can optionally be overloaded when inheriting PipeOpTaskPreproc
, together with private$.train_dt()
and
private$.predict_dt()
; alternatively, private$.train_task()
and private$.predict_task()
can be overloaded.
If this method is not overloaded, it defaults to selecting of type indicated by the feature_types
construction argument.
https://mlr-org.com/pipeops.html
Other mlr3pipelines backend related:
Graph
,
PipeOp
,
PipeOpTargetTrafo
,
PipeOpTaskPreprocSimple
,
mlr_graphs
,
mlr_pipeops
,
mlr_pipeops_updatetarget
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_adas
,
mlr_pipeops_blsmote
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_rowapply
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_smotenc
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.