PipeOpTaskPreprocSimple: Simple Task Preprocessing Base Class
In mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'

PipeOpTaskPreprocSimple

R Documentation

Simple Task Preprocessing Base Class

Description

Base class for handling many "preprocessing" operations that perform essentially the same operation during training and prediction. Instead implementing a private$.train_task() and a private$.predict_task() operation, only a private$.get_state() and a private$.transform() operation needs to be defined, both of which take one argument: a Task.

Alternatively, analogously to the PipeOpTaskPreproc approach of offering private$.train_dt()/private$.predict_dt(), the private$.get_state_dt() and private$.transform_dt() functions may be implemented.

private$.get_state must not change its input value in-place and must return something that will be written into ⁠$state⁠ (which must not be NULL), private$.transform() should modify its argument in-place; it is called both during training and prediction.

This inherits from PipeOpTaskPreproc and behaves essentially the same.

Format

Abstract R6Class inheriting from PipeOpTaskPreproc/PipeOp.

Construction

PipeOpTaskPreprocSimple$new(id, param_set = ps(), param_vals = list(), can_subset_cols = TRUE,
  packages = character(0), task_type = "Task", tags = NULL, feature_types = mlr_reflections$task_feature_types)

(Construction is identical to PipeOpTaskPreproc.)

id :: character(1)
Identifier of resulting object. See ⁠$id⁠ slot of PipeOp.
param_set :: ParamSet
Parameter space description. This should be created by the subclass and given to super$initialize().
param_vals :: named list
List of hyperparameter settings, overwriting the hyperparameter settings given in param_set. The subclass should have its own param_vals parameter and pass it on to super$initialize(). Default list().
can_subset_cols :: logical(1)
Whether the affect_columns parameter should be added which lets the user limit the columns that are modified by the PipeOpTaskPreprocSimple. This should generally be FALSE if the operation adds or removes rows from the Task, and TRUE otherwise. Default is TRUE.
packages :: character
Set of all required packages for the PipeOp's private$.train() and private$.predict() methods. See ⁠$packages⁠ slot. Default is character(0).
task_type :: character(1)
The class of Task that should be accepted as input and will be returned as output. This should generally be a character(1) identifying a type of Task, e.g. "Task", "TaskClassif" or "TaskRegr" (or another subclass introduced by other packages). Default is "Task".
tags :: character | NULL
Tags of the resulting PipeOp. This is added to the tag "data transform". Default NULL.
feature_types :: character
Feature types affected by the PipeOp. See private$.select_cols() for more information. Defaults to all available feature types.

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreproc.

The output during training and prediction is the Task, modified by private$.transform() or private$.transform_dt().

State

The ⁠$state⁠ is a named list with the ⁠$state⁠ elements inherited from PipeOpTaskPreproc.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreproc.

Internals

PipeOpTaskPreprocSimple is an abstract class inheriting from PipeOpTaskPreproc and implementing the private$.train_task() and private$.predict_task() functions. A subclass of PipeOpTaskPreprocSimple may implement the functions private$.get_state() and private$.transform(), or alternatively the functions private$.get_state_dt() and private$.transform_dt() (as well as private$.select_cols(), in the latter case). This works by having the default implementations of private$.get_state() and private$.transform() call private$.get_state_dt() and private$.transform_dt().

Fields

Fields inherited from PipeOp.

Methods

Methods inherited from PipeOpTaskPreproc, as well as:

.get_state(task)
(Task) -> named list
Store create something that will be stored in ⁠$state⁠ during training phase of PipeOpTaskPreprocSimple. The state can then influence the private$.transform() function. Note that private$.get_state() must return the state, and should not store it in ⁠$state⁠. It is not strictly necessary to implement either private$.get_state() or private$.get_state_dt(); if they are not implemented, the state will be stored as list().
This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with private$.transform(); alternatively, private$.get_state_dt() (optional) and private$.transform_dt() (and possibly private$.select_cols(), from PipeOpTaskPreproc) can be overloaded.
.transform(task)
(Task) -> Task
Predict on new data in task, possibly using the stored ⁠$state⁠. task should not be cloned, instead it should be changed in-place. This method is called both during training and prediction phase, and should essentially behave the same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit from PipeOpTaskPreproc, not from PipeOpTaskPreprocSimple.)
This method can be overloaded when inheriting from PipeOpTaskPreprocSimple, optionally with private$.get_state(); alternatively, private$.get_state_dt() (optional) and private$.transform_dt() (and possibly private$.select_cols(), from PipeOpTaskPreproc) can be overloaded.
.get_state_dt(dt)
(data.table) -> named list
Create something that will be stored in ⁠$state⁠ during training phase of PipeOpTaskPreprocSimple. The state can then influence the private$.transform_dt() function. Note that private$.get_state_dt() must return the state, and should not store it in ⁠$state⁠. If neither private$.get_state() nor private$.get_state_dt() are overloaded, the state will be stored as list().
This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with private$.transform_dt() (and optionally private$.select_cols(), from PipeOpTaskPreproc); Alternatively, private$.get_state() (optional) and private$.transform() can be overloaded.
.transform_dt(dt)
(data.table) -> data.table | data.frame | matrix
Predict on new data in dt, possibly using the stored ⁠$state⁠. A transformed object must be returned that can be converted to a data.table using as.data.table. dt does not need to be copied deliberately, it is possible and encouraged to change it in-place. This method is called both during training and prediction phase, and should essentially behave the same independently of phase. (If this is incongruent with the functionality to be implemented, then it should inherit from PipeOpTaskPreproc, not from PipeOpTaskPreprocSimple.)
This method can optionally be overloaded when inheriting from PipeOpTaskPreprocSimple, together with private$.transform_dt() (and optionally private$.select_cols(), from PipeOpTaskPreproc); Alternatively, private$.get_state() (optional) and private$.transform() can be overloaded.

Other PipeOps: PipeOp, PipeOpEncodePL, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, mlr_pipeops, mlr_pipeops_adas, mlr_pipeops_blsmote, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_datefeatures, mlr_pipeops_decode, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_encodeplquantiles, mlr_pipeops_encodepltree, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_info, mlr_pipeops_isomap, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_learner_pi_cvplus, mlr_pipeops_learner_quantiles, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nearmiss, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_rowapply, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_smotenc, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tomek, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Other mlr3pipelines backend related: Graph, PipeOp, PipeOpTargetTrafo, PipeOpTaskPreproc, mlr_graphs, mlr_pipeops, mlr_pipeops_updatetarget