R/makeCPOHelp.R

#' @title Create a Custom CPO Constructor
#'
#' @description
#' \code{makeCPO} creates a \emph{Feature Operation} \code{\link{CPOConstructor}}, i.e. a constructor for a \code{\link{CPO}} that will
#' operate on feature columns. \code{makeCPOTargetOp} creates a \emph{Target Operation} \code{\link{CPOConstructor}}, which
#' creates \code{\link{CPO}}s that operate on the target column. \code{makeCPORetrafoless} creates a \emph{Retrafoless} \code{\link{CPOConstructor}},
#' which creates \code{\link{CPO}}s that may operate on both feature and target columns, but have no retrafo operation. See \link{OperatingType} for further
#' details on the distinction of these. \code{makeCPOExtendedTrafo} creates a \emph{Feature Operation} \code{\link{CPOConstructor}} that
#' has slightly more flexibility in its data transformation behaviour than \code{makeCPO} (but is otherwise identical).
#' \code{makeCPOExtendedTargetOp} creates a \emph{Target Operation} \code{\link{CPOConstructor}} that has slightly more flexibility in its
#' data transformation behaviour than \code{makeCPOTargetOp} but is otherwise identical.
#'
#' See example section for some simple custom CPO.
#'
#' @section CPO Internals:
#' The mlrCPO package offers a powerful framework for handling the tasks necessary for preprocessing, so that the user, when creating custom CPOs,
#' can focus on the actual data transformations to perform. It is, however, useful to understand \emph{what} it is that the framework does, and how
#' the process can be influenced by the user during CPO definition or application. Aspects of preprocessing that the user needs to influence are:
#' \describe{
#'   \item{\strong{Operating Type}}{
#'     The core of preprocessing is the actual transformation being performed. In the most general sense, there are three points in a machine
#'     learning pipeline that preprocessing can influence.
#'     \enumerate{
#'       \item Transformation of training data \emph{before model fitting}, done in mlr using \code{\link[mlr]{train}}. In the CPO framework
#'         (\emph{when not using a \code{\link{CPOLearner}} which makes all of these steps transparent to the user}), this is
#'         done by a \code{\link{CPO}}.
#'       \item transformation of new validation or prediction data that is given to the fitted model for \emph{prediction}, done using
#'         \code{\link[stats]{predict}}. This is done by a \code{\link{CPORetrafo}} retrieved using \code{\link{retrafo}} from the result of step 1.
#'       \item transformation of the predictions made to invert the transformation of the target values done in step 1, which is done using
#'         the \code{\link{CPOInverter}} retrieved using \code{\link{inverter}} from the result of step 2.
#'     }
#'     The framework poses restrictions on primitive (i.e. not compound using \code{\link{composeCPO}}) \code{\link{CPO}}s to simplify internal
#'     operation: A \code{\link{CPO}} may be one of three \link{OperatingType}s (see there). The \emph{Feature Operation} \code{\link{CPO}} does not
#'     transform target columns and hence only needs to be involved in steps 1 and 2. The \emph{Target Operation} \code{\link{CPO}} only transforms
#'     target columns, and therefore mostly concerns itself with steps 1 and 3. A \emph{Retrafoless} \code{\link{CPO}} may change both feature and
#'     target columns, but may not perform a retrafo \emph{or} inverter operation (and is therefore only concerned with step 1). Note that this
#'     is effectively a restriction on what kind of transformation a Retrafoless CPO may perform: it must not be a transformation of the data
#'     or target \emph{space}, it may only act or subtract points within this space.
#'
#'     The Operating Type of a \code{\link{CPO}} is ultimately dependent on the function that was used to create the \code{\link{CPOConstructor}}:
#'     \code{makeCPO} / \code{makeCPOExtendedTrafo}, \code{makeCPOTargetOp} / \code{makeCPOExtendedTargetOp}, or \code{makeCPORetrafoless}.}
#'   \item{\strong{Data Transformation}}{
#'     At the core of a CPO is the modification of data it performs. For Feature Operation CPOs, the transformation of each row,
#'     during training \emph{and} prediction, should
#'     happen in the same way, and it may only depend on the entirety of the \emph{training} data--i.e. the value of a data row in a prediction
#'     data set may not influence the transformation of a different prediction data row. Furthermore, if a data row occurs in both training and prediction
#'     data, its transformation result should ideally be the same.
#'
#'     This property is ensured by \code{makeCPO} by splitting the transformation
#'     into two functions: One function that collects all relevant information from the training data (called \code{cpo.train}), and one that transforms
#'     given data, using this collected information and (\emph{potentially new, unseen}) data to be transformed (called \code{cpo.retrafo}). The \code{cpo.retrafo}
#'     function should handle all data as if it were prediction data and unrelated to the data given to \code{cpo.train}.
#'
#'     Internally, when a \code{\link{CPO}} gets applied to a data set using \code{\link{applyCPO}}, the \code{cpo.train} function is called, and the
#'     resulting control object is used for a subsequent \code{cpo.retrafo} call which transforms the data. Before the result is given back from the
#'     \code{\link{applyCPO}} call, the control object is used to create a \code{\link{CPORetrafo}} object,
#'     which is attached to the result as attribute. Target Operating CPOs additionally create and add a \code{\link{CPOInverter}} object.
#'
#'     When a \code{\link{CPORetrafo}} is then applied to new prediction data, the control object previously returned by \code{cpo.train} is given,
#'     combined with this \emph{new} data, to another \code{cpo.retrafo} call that performs the new transformation.
#'
#'     \code{makeCPOExtendedTrafo} gives more flexibility by having calling only the \code{cpo.trafo} in the training step, which both creates a control
#'     object \emph{and} modifies the data. This can increase performance if the underlying operation creates a control object and the transformed data in one step,
#'     as for example \emph{PCA} does. Note that the requirement that the same row in training and prediction data should result in the same transformation
#'     result still stands. The \code{cpo.trafo} function returns the transformed data and creates a local variable with the control information, which the
#'     CPO framework will access.}
#'   \item{\strong{Inversion}}{
#'     If a \code{\link{CPO}} performs transformations of the \emph{target} column, the predictions made by a following machine learning process should
#'     ideally have this transformation undone, so that if the process makes a prediction that coincides with a target value \emph{after} the
#'     transformation, the whole pipeline should return a prediction that equals to the target value \emph{before} this transformation.
#'
#'     This is done by the \code{cpo.invert} function given to \code{makeCPOTargetOp}. It has access to information from both the preceding training and prediction
#'     steps. During the training step, \code{cpo.train} createas a \code{control} object that is not only given to \code{cpo.retrafo}, but also
#'     to \code{cpo.train.invert}. This latter function is called before the prediction step, whenever new data is fed to the machine learning process.
#'     It takes the new data and the old \code{control} object and transforms it to a new \code{control.invert} object to include information about the prediction
#'     data. This object is then given to \code{cpo.invert}.
#'
#'     It is possible to have Target Operation CPOs that do not require information from the retrafo step. This is specified by setting
#'     \code{constant.invert} to \code{TRUE}. It has the advantage that the same \code{\link{CPOInverter}}
#'     can be used for inversion of predictions made with any new data. Otherwise, a new \code{\link{CPOInverter}} object must be obtained for each
#'     new data set after the retrafo step (using the \code{\link{inverter}} function on the retrafo result). Having \code{constant.invert} set to \code{TRUE}
#'     results in \emph{hybrid} retrafo / inverter objects: The \code{\link{CPORetrafo}} object can then also be used for \code{inversions}.
#'     When defining a \code{constant.invert} Target Operating CPO, no \code{cpo.train.invert} function is given, and the same \code{control}
#'     object is given to both \code{cpo.retrafo} and \code{cpo.invert}.
#'
#'     \code{makeCPOExtendedTargetOp} gives more flexibility and allows more efficient implementation of Target Operating CPOs at cost of more complexity.
#'     With this method, a \code{cpo.trafo} function is given that is executed during the first training step; It must return the transformed target column,
#'     as well as a \code{control} and \code{control.invert} object. The \code{cpo.retrafo} function not only transforms the target, but must also
#'     create a new \code{control.invert} object (unless \code{constant.invert} is \code{TRUE}). The semantics of \code{cpo.invert} is identical with the
#'     basic \code{makeCPOTargetOp}.}
#'   \item{\strong{\code{cpo.train}-\code{cpo.retrafo} information transfer}}{
#'     One possibility to transfer information from \code{cpo.train} to \code{cpo.retrafo} is to have \code{cpo.train} return a
#'     control object (a \code{\link[base]{list}})
#'     that is then given to \code{cpo.retrafo}. The CPO is then called an \emph{object based} CPO.
#'
#'     Another possibility is to not give the \code{cpo.retrafo}
#'     argument (set it to \code{NULL} in the \code{makeCPO} call) and have \code{cpo.train} instead return a \emph{function} instead. This function is then
#'     used as the \code{cpo.retrafo} function, and should have access to all relevant information about the training data as a closure. This is called
#'     \emph{functional} CPO. To save memory, the actual data (including target) given to \code{cpo.train} is removed from the environment of its
#'     return value in this case
#'     (i.e. the environment of the \code{cpo.retrafo} function). This means the \code{cpo.retrafo} function may not reference a \dQuote{\code{data}} variable.
#'
#'     There are similar possibilities of functional information transfer for other types of CPOs: \code{cpo.trafo} in \code{makeCPOExtendedTargetOp} may
#'     create a \code{cpo.retrafo} function instead of a \code{control} object. \code{cpo.train} in \code{makeCPOTargetOp} has the option of creating
#'     a \code{cpo.retrafo} and \code{cpo.train.invert} (\code{cpo.invert} if \code{constant.invert} is \code{TRUE}) function (and returning \code{NULL})
#'     instead of returning a \code{control} object. Similarly, \code{cpo.train.invert} may return a \code{cpo.invert} function instead of a \code{control.invert}
#'     object. In \code{makeCPOExtendedTargetOp}, \code{cpo.trafo} may create a \code{cpo.retrafo} or a \code{cpo.invert} function, each optionally instead
#'     of a \code{control} or \code{control.invert} object (one \emph{or} both may be functional). \code{cpo.retrafo} similarly may create a \code{cpo.invert}
#'     function instead of giving a \code{control.invert} object. Functional information transfer may be more parsimonious and elegant than control
#'     object information transfer.}
#'   \item{\strong{Hyperparameters}}{
#'     The action performed by a CPO may be influenced using \emph{hyperparameters}, during its construction as well as afterwards (then using
#'     \code{\link[mlr]{setHyperPars}}). Hyperparameters must be specified as a \code{\link[ParamHelpers:makeParamSet]{ParamSet}} and given as argument \code{par.set}.
#'     Default values for each parameter may be specified in this \code{\link[ParamHelpers:makeParamSet]{ParamSet}} or optionally as another argument \code{par.vals}.
#'
#'     Hyperparameters given are made part of the \code{\link{CPOConstructor}} function and can thus be given during construction.
#'     Parameter default values function as the default values for the \code{\link{CPOConstructor}} function parameters (which are thus made optional function
#'     parameters of the \code{\link{CPOConstructor}} function). The CPO framework handles storage and changing of hyperparameter values.
#'     When the \code{cpo.train} and \code{cpo.retrafo} functions are called to transform data, the hyperparameter values are given to them as arguments, so
#'     \code{cpo.train} and \code{cpo.retrafo} functions must be able to accept these parameters, either directly, or with a \code{...} argument.
#'
#'     Note that with \emph{functional} \code{\link{CPO}}s, the \code{cpo.retrafo} function does not take hyperparameter arguments (and instead can usually
#'     refer to them by its environment).
#'
#'     Hyperparameters may be \emph{exported} (or not), thus making them available for \code{\link[mlr]{setHyperPars}}. Not exporting a parameter
#'     has advantage that it does not clutter the \code{\link[ParamHelpers:makeParamSet]{ParamSet}} of a big \code{\link{CPO}} or \code{\link{CPOLearner}} pipeline with
#'     many hyperparameters. Which hyperparameters are exported is chosen during the constructing call of a \code{\link{CPOConstructor}}, but the default
#'     exported hyperparameters can be chosen with the \code{export.params} parameter.}
#'   \item{\strong{Properties}}{
#'     Similarly to \code{\link[mlr:makeLearner]{Learner}}s, \code{\link{CPO}}s may specify what kind of data they are and are not able to handle. This is done by
#'     specifying \code{.properties.*} arguments. The names of possible properties are the same as possible \code{\link[mlr]{LearnerProperties}}, but since
#'     \code{\link{CPO}}s mostly concern themselves with data, only the properties indicating column and task types are relevant.
#'
#'     For each \code{\link{CPO}} one must specify
#'     \enumerate{
#'       \item which kind of data does the \code{\link{CPO}} handle,
#'       \item which kind of data must the \code{\link{CPO}} or \code{\link[mlr:makeLearner]{Learner}} be able to handle that comes \emph{after}
#'         the given \code{\link{CPO}}, and
#'       \item which kind of data handling capability does the given \code{\link{CPO}} \emph{add} to a following
#'         \code{\link{CPO}} or \code{\link[mlr:makeLearner]{Learner}} if coming before it in a pipeline.
#'     }
#'     The specification of (1) is done with \code{properties.data} and \code{properties.target}, (2) is specified using \code{properties.needed}, and
#'     (3) is specified using \code{properties.adding}. Internally, \code{properties.data} and \code{properties.target} are concatenated and treated as
#'     one vector, they are specified separately in \code{makeCPO} etc. for convenience reasons. See \code{\link{CPOProperties}} for details.
#'
#'     The CPO framework checks the \code{cpo.retrafo} etc. functions for adherence to these properties, so it e.g. throws an error if a \code{cpo.retrafo}
#'     function adds missing values to some data but didn't declare \dQuote{missings} in \code{properties.needed}. It may be desirable to have this
#'     internal checking happen to a laxer standard than the property checking when composing CPOs (e.g. when a CPO adds missings only with certain
#'     hyperparameters, one may still want to compose this CPO to another one that can't handle missings). Therefore it is possible to postfix
#'     listed properties with \dQuote{.sometimes}. The internal CPO checking will ignore these when listed in \code{properties.adding}
#'     (it uses the \sQuote{minimal} set of adding properties, \code{adding.min}), and it will not declare them externally when listed in
#'     \code{properties.needed} (but keeps them internally in the \sQuote{maximal} set of needed properties, \code{needed.max}). The \code{adding.min}
#'     and \code{needed.max} can be retrieved using \code{\link{getCPOProperties}} with \code{get.internal = TRUE}.}
#'   \item{\strong{Data Format}}{
#'     Different CPOs may want to change different aspects of the data, e.g. they may only care about numeric columns, they may or may not care about
#'     the target column values, sometimes they might need the actual task used as input. The CPO framework offers to present the data in a specified
#'     formats to the \code{cpo.train}, \code{cpo.retrafo} and other functions, to reduce the need for boilerplate data subsetting on the user's part. The format is
#'     requested using the \code{dataformat} and \code{dataformat.factor.with.ordered} parameter. A \code{cpo.retrafo} function is expected to return
#'     data in the same format as it requested, so if it requested a \code{\link[mlr]{Task}}, it must return one, while if it only
#'     requested the feature \code{data.frame}, a \code{data.frame} must be returned.}
#'   \item{\strong{Task Conversion}}{
#'     Target Operation CPOs can be used for conversion between \code{\link[mlr]{Task}}s. For this, the \code{type.out} value must be given. Task conversion
#'     works with all values of \code{dataformat} and is handled by the CPO framework. The \code{cpo.trafo} function must take care to return the target data
#'     in a proper format (see above). Note that for conversion, not only does the \code{\link[mlr]{Task}} type need to be changed during \code{cpo.trafo}, but
#'     also the \emph{prediction} format (see above) needs to change.}
#'   \item{\strong{Fix Factors}}{
#'     Some preprocessing for factorial columns needs the factor levels to be the same during training and prediction. This is usually not guarranteed
#'     by mlr, so the framework offers to do this if the \code{fix.factors} flag is set.}
#'   \item{\strong{ID}}{
#'     To prevent parameter name clashes when \code{\link{CPO}}s are concatenated, the parameters are prefixed with the \code{\link{CPO}}s
#'     \emph{\link[=getCPOId]{id}}.
#'     The ID can be set during \code{\link{CPO}} construction, but will default to the \code{\link{CPO}}s \emph{name} if not given. The name is set
#'     using the \code{cpo.name} parameter.}
#'   \item{\strong{Packages}}{
#'     Whenever a \code{\link{CPO}} needs certain packages to be installed to work, it can specify these in the \code{packages} parameter. The framework
#'     will check for the availability of the packages and throw an error if not found \emph{during construction}. This means that loading a \code{\link{CPO}}
#'     from a savefile will omit this check, but in most cases it is a sufficient measure to make the user aware of missing packages in time.}
#'   \item{\strong{Target Column Format}}{
#'     Different \code{\link[mlr]{Task}} types have the target in a different formats. They are listed here for reference. Target data is in this format
#'     when given to the \code{target} argument of some functions, and must be returned in this format by \code{cpo.trafo}
#'     in Target Operation CPOs. Target values are always in the format of a \code{\link[base]{data.frame}}, even when only one column.
#'     \tabular{ll}{
#'       \bold{Task type}    \tab \bold{target format}                          \cr
#'       \dQuote{classif}    \tab one column of \code{\link[base]{factor}}      \cr
#'       \dQuote{cluster}    \tab \code{data.frame} with zero columns.          \cr
#'       \dQuote{multilabel} \tab several columns of \code{\link[base]{logical}}\cr
#'       \dQuote{regr}       \tab one column of \code{\link[base]{numeric}}     \cr
#'       \dQuote{surv}       \tab two columns of \code{\link[base]{numeric}}
#'     }
#'
#'     When inverting, the format of the \code{target} argument, as well as the return value of, the \code{cpo.invert} function depends on the
#'     \code{\link[mlr]{Task}} type as well as the \code{predict.type}. The requested return value \code{predict.type} is given to the \code{cpo.invert} function
#'     as a parameter, the \code{predict.type} of the \code{target} parameter depends on this and the \code{predict.type.map} (see \link{PredictType}).
#'     The format of the prediction, depending on the task type and \code{predict.type}, is:
#'     \tabular{lll}{
#'       \bold{Task type}    \tab \bold{\code{predict.type}} \tab \bold{target format}                          \cr
#'       \dQuote{classif}    \tab \dQuote{response}          \tab \code{\link[base]{factor}}                    \cr
#'       \dQuote{classif}    \tab \dQuote{prob}              \tab \code{\link[base]{matrix}} with nclass cols   \cr
#'       \dQuote{cluster}    \tab \dQuote{response}          \tab \code{\link[base]{integer}} cluster index     \cr
#'       \dQuote{cluster}    \tab \dQuote{prob}              \tab \code{\link[base]{matrix}} with nclustr cols  \cr
#'       \dQuote{multilabel} \tab \dQuote{response}          \tab \code{\link[base]{logical}} \code{\link[base]{matrix}} \cr
#'       \dQuote{multilabel} \tab \dQuote{prob}              \tab \code{\link[base]{matrix}} with nclass cols   \cr
#'       \dQuote{regr}       \tab \dQuote{response}          \tab \code{\link[base]{numeric}}                   \cr
#'       \dQuote{regr}       \tab \dQuote{se}                \tab 2-col \code{\link[base]{matrix}}              \cr
#'       \dQuote{surv}       \tab \dQuote{response}          \tab \code{\link[base]{numeric}}                   \cr
#'       \dQuote{surv}       \tab \dQuote{prob}              \tab [NOT YET SUPPORTED]
#'     }
#'     All \code{\link[base]{matrix}} formats are \code{\link[base]{numeric}}, unless otherwise stated.}
#' }
#'
#' @section Headless function definitions:
#' In the place of all \code{cpo.*} arguments, it is possible to make a \emph{headless} function definition, consisting only of the function body.
#' This function body must always begin with a \sQuote{\code{\{}}. For example, instead of
#' \code{cpo.retrafo = function(data, control) data[-1]}, it is possible to use
#' \code{cpo.retrafo = function(data, control) \{ data[-1] \}}. The necessary function head is then added automatically by the CPO framework.
#' This will always contain the necessary parameters (e.g. \dQuote{\code{data}}, \dQuote{\code{target}}, hyperparameters as defined in \code{par.set})
#' in the names as required. This can declutter the definition of a \code{\link{CPOConstructor}} and is recommended if the CPO consists of
#' few lines.
#'
#' Note that if this is used when writing an R package, inside a function, this may lead to the automatic R correctness checker to print warnings.
#'
#'
#' @param cpo.name [\code{character(1)}]\cr
#'   The name of the resulting \code{\link{CPOConstructor}} / \code{\link{CPO}}. This is used for identification in output,
#'   and as the default \code{\link[=getCPOId]{id}}.
#' @param par.set [\code{\link[ParamHelpers:makeParamSet]{ParamSet}}]\cr
#'   Optional parameter set, for configuration of CPOs during construction or by hyperparameters.
#'   Default is an empty \code{\link[ParamHelpers:makeParamSet]{ParamSet}}.
#'   It is recommended to use \code{\link{pSS}} to construct this, as it greatly reduces the verbosity of
#'   creating a \code{\link[ParamHelpers:makeParamSet]{ParamSet}} and makes it more readable.
#' @param par.vals [\code{list} | \code{NULL}]\cr
#'   Named list of default parameter values for the CPO. These are used \emph{instead of} the
#'   parameter default values in \code{par.set}, if not \code{NULL}. It is preferred to use
#'   \code{\link[ParamHelpers:makeParamSet]{ParamSet}} default values,
#'   and not \code{par.vals}. Default is \code{NULL}.
#' @param dataformat [\code{character(1)}]\cr
#'   Indicate what format the data should be as seen by the \code{cpo.train} and \code{cpo.retrafo} function.
#'   The following table shows what values of \code{dataformat} lead to what is given to \code{cpo.train} and \code{cpo.retrafo}
#'   as \code{data} and \code{target} parameter value. (Note that for Feature Operating CPOs, \code{cpo.retrafo} has no \code{target} argument.) Possibilities are:
#'   \tabular{lll}{
#'     \bold{dataformat}    \tab \bold{data}                            \tab \bold{target}                \cr
#'     \dQuote{df.all}       \tab \code{data.frame} with target cols     \tab target colnames              \cr
#'     \dQuote{df.features}  \tab \code{data.frame} without target       \tab \code{data.frame} of target  \cr
#'     \dQuote{task}         \tab full \code{\link[mlr]{Task}}           \tab target colnames              \cr
#'     \dQuote{split}        \tab list of \code{data.frames} by type     \tab \code{data.frame} of target  \cr
#'     [type]                \tab \code{data.frame} of [type] feats only \tab \code{data.frame} of target
#'   }
#'   [type] can be any one of \dQuote{factor}, \dQuote{numeric}, \dQuote{ordered}; if these are given, only a subset of the total
#'   data present is seen by the \code{\link{CPO}}.
#'
#'   Note that \code{makeCPORetrafoless} accepts only \dQuote{task} and \dQuote{df.all}.
#'
#'   For \code{dataformat == "split"}, \code{cpo.train} and \code{cpo.retrafo} get a list with entries \dQuote{factor}, \dQuote{numeric},
#'   \dQuote{other}, and, if \code{dataformat.factor.with.ordered} is \code{FALSE}, \dQuote{ordered}.
#'
#'   If the CPO is a Feature Operation CPO, then the return value of the \code{cpo.retrafo} function must be in the same format as the one requested.
#'   E.g. if \code{dataformat} is \dQuote{split}, the return value must be a named list with entries \code{$numeric},
#'   \code{$factor}, and \code{$other}. The types of the returned data may be arbitrary: In the given example,
#'   the \code{$factor} slot of the returned list may contain numeric data. (Note however that if data is returned
#'   that has a type not already present in the data, \code{properties.needed} must specify this.)
#'
#'   For Feature Operating CPOs, if \code{dataformat} is either \dQuote{df.all} or \dQuote{task}, the
#'   target column(s) in the returned value of the retrafo function must be identical with the target column(s) given as input.
#'
#'   If \code{dataformat} is \dQuote{split}, the \code{$numeric} slot of the value returned by the \code{cpo.retrafo} function
#'   may also be a \code{\link[base]{matrix}}. If \code{dataformat} is \dQuote{numeric}, the returned object may also be a
#'   matrix.
#'
#'   Default is \dQuote{df.features} for all functions except \code{makeCPORetrafoless}, for which it is \dQuote{df.all}.
#' @param dataformat.factor.with.ordered [\code{logical(1)}]\cr
#'   Whether to treat \code{ordered} typed features as \code{factor} typed features. This affects how \code{dataformat} is handled, for which it only
#'   has an effect if \code{dataformat} is \dQuote{split} or \dQuote{factor}. If \code{dataformat} is \dQuote{ordered}, this must be \code{FALSE}.
#'   It also affects how strictly data fed to a \code{\link{CPORetrafo}} object
#'   is checked for adherence to the data format of data given to the generating \code{\link{CPO}}. Default is \code{TRUE}.
#' @param export.params [\code{logical(1)} | \code{character}]\cr
#'   Indicates which CPO parameters are exported by default. Exported parameters can be changed after construction using \code{\link[mlr]{setHyperPars}},
#'   but exporting too many parameters may lead to messy parameter sets if many CPOs are combined using \code{\link{composeCPO}} or \code{\link{\%>>\%}}.
#'   The exported parameters can be set during construction, but \code{export.params} determines the \emph{default} exported parameters.
#'   If this is a \code{logical(1)}, \code{TRUE} exports all parameters, \code{FALSE} to exports no parameters. It may also be a \code{character},
#'   indicating the names of parameters to be exported. Default is \code{TRUE}.
#' @param fix.factors [\code{logical(1)}]\cr
#'   Whether to constrain factor levels of new data to the levels of training data, for each factorial or ordered column. If new data contains
#'   factors that were not present in training data, the values are set to \code{NA}. Default is \code{FALSE}.
#' @param properties.data [\code{character}]\cr
#'   The kind if data that the CPO will be able to handle. This can be one or more of: \dQuote{numerics},
#'   \dQuote{factors}, \dQuote{ordered}, \dQuote{missings}.
#'   There should be a bias towards including properties. If a property is absent, the preproc
#'   operator will reject the data. If an operation e.g. only works on numeric columns that have no
#'   missings (like PCA), it is recommended to give all properties, ignore the columns that
#'   are not numeric (using \code{dataformat = "numeric"}), and giving an error when
#'   there are missings in the numeric columns (since missings in factorial features are not a problem).
#'   Defaults to the maximal set.
#' @param properties.target [\code{character}]\cr
#'   For Feature Operation CPOs, this can be one or more of \dQuote{cluster}, \dQuote{classif}, \dQuote{multilabel}, \dQuote{regr}, \dQuote{surv},
#'   \dQuote{oneclass}, \dQuote{twoclass}, \dQuote{multiclass}. Just as \code{properties.data}, it
#'   indicates what kind of data a CPO can work with. To handle data given as \code{data.frame}, the \dQuote{cluster} property is needed. Default is the maximal set.
#'
#'   For Target Operation CPOs, this \emph{must} contain exactly one of \dQuote{cluster}, \dQuote{classif}, \dQuote{multilabel}, \dQuote{regr}, \dQuote{surv}.
#'   This indicates the type of \code{\link[mlr]{Task}} the
#'   \code{\link{CPO}} can work on. If the input is a \code{data.frame}, it is treated as a \dQuote{cluster} type \code{\link[mlr]{Task}}.
#'   If the \code{properties.target} contains \dQuote{classif}, the value must then also contain one or more of \dQuote{oneclass},
#'   \dQuote{twoclass}, or \dQuote{multiclass}. Default is \dQuote{cluster}.
#' @param properties.adding [\code{character}]\cr
#'   Can be one or many of the same values as \code{properties.data} for Feature Operation CPOs, and one or many of the same values as \code{properties.target}
#'   for Target Operation CPOs. These properties \emph{get added} to a \code{\link[mlr:makeLearner]{Learner}} (or \code{\link{CPO}}) coming after / behind this CPO.
#'   When a CPO imputes missing values, for example, this should be \dQuote{missings}. This must be a subset of \dQuote{properties.data} or
#'   \dQuote{properties.target}.
#'
#'   Note that this may \emph{not} contain a \code{\link[mlr]{Task}}-type property, even if the \code{\link{CPO}} is a Target Operation CPO that performs
#'   conversion.
#'
#'   Property names may be postfixed with \dQuote{.sometimes}, to indicate that adherence should not be checked internally. This distinction is made by
#'   not putting them in the \code{$adding.min} slot of the \code{\link{getCPOProperties}} return value when \code{get.internal = TRUE}.
#'
#'   Default is \code{character(0)}.
#' @param properties.needed [\code{character}]\cr
#'   Can be one or many of the same values as \code{properties.data} for Feature Operation CPOs,
#'   and one or many of the same values as \code{properties.target}. These properties are \emph{required}
#'   from a \code{\link[mlr:makeLearner]{Learner}} (or \code{\link{CPO}}) coming after / behind this CPO. E.g., when a CPO converts factors to
#'   numerics, this should be \dQuote{numerics} (and \code{properties.adding} should be \dQuote{factors}).
#'
#'   Note that this may \emph{not} contain a \code{\link[mlr]{Task}}-type property, even if the \code{\link{CPO}} is a Target Operation CPO that performs
#'   conversion.
#'
#'   Property names may be postfixed with \dQuote{.sometimes}, to indicate that adherence should not be checked internally. This distinction is made by
#'   not putting them in the \code{$needed} slot of properties. They can still be found in the \code{$needed.max} slot of the
#'   \code{\link{getCPOProperties}} return value when \code{get.internal = TRUE}.
#'
#'   Default is \code{character(0)}.
#' @param packages [\code{character}]\cr
#'   Package(s) that should be loaded when the CPO is constructed. This gives the user an error if
#'   a package required for the CPO is not available on his system, or can not be loaded. Default is \code{character(0)}.
#' @param constant.invert [\code{logical(1)}]\cr
#'   Whether the \code{cpo.invert} step should not have information from the previous \code{cpo.retrafo} or \code{cpo.train.invert} step in
#'   Target Operation CPOs (\code{makeCPOTargetOp} or \code{makeCPOExtendedTargetOp}).
#'
#'   For \code{makeCPOTargetOp}, if this is \code{TRUE}, the
#'   \code{cpo.train.invert} argument must be \code{NULL}. If \code{cpo.retrafo} and \code{cpo.invert} are given, the same \code{control}
#'   object is given to both of them. Otherwise, if \code{cpo.retrafo} and \code{cpo.invert} are \code{NULL}, the \code{cpo.train} function
#'   must return \code{NULL} and define a \code{cpo.retrafo} and \code{cpo.invert} function in its namespace (see \code{cpo.train} documentation
#'   for more details). If \code{constant.invert} is \code{FALSE}, \code{cpo.train} may either return a \code{control} object that will then be
#'   given to \code{cpo.train.invert}, or define a \code{cpo.retrafo} and \code{cpo.train.invert} function in its namespace.
#'
#'   For \code{makeCPOExtendedTargetOp}, if this is \code{TRUE}, \code{cpo.retrafo} does not need to generate a \code{control.invert} object.
#'   The \code{control.invert} object created in \code{cpo.trafo} will then always be given to \code{cpo.invert} for all data sets.
#'
#'   Default is \code{FALSE}.
#' @param predict.type.map [\code{character} | \code{list}]\cr
#'   This becomes the \code{\link{CPO}}'s \code{predict.type}, explained in detail in \link{PredictType}.
#'
#'   In short, the \code{predict.type.map} is a character vector, or a \code{list} of \code{character(1)},
#'   with \emph{names} according to the predict types \code{predict} can request
#'   in its \code{predict.type} argument when the created \code{\link{CPO}} was used as part of a \code{\link{CPOLearner}} to create the
#'   model under consideration. The \emph{values} of \code{predict.type.map} are the \code{predict.type} that will be requested from the
#'   underlying \code{\link[mlr:makeLearner]{Learner}} for prediction.
#'
#'   \code{predict.type.map} thus determines the format that the \code{target} parameter of \code{cpo.invert} can take: It is
#'   the format according to \code{predict.type.map[predict.type]}, where \code{predict.type} is the respective \code{cpo.invert} parameter.
#' @param task.type.out [\code{character(1)} | \code{NULL}]\cr
#'   If \code{\link[mlr]{Task}} conversion is to take place, this is the output task that the data should be converted to. Note that the
#'   CPO framework takes care of the conversion if \code{dataformat} is not \dQuote{task}, but the target column needs to have the
#'   proper format for that.
#'
#'   If this is \code{NULL}, \code{\link[mlr]{Task}}s will not be converted. Default is \code{NULL}.
#' @param cpo.train [\code{function} | \code{NULL}]\cr
#'   This is a function which must have the parameters \code{data} and \code{target},
#'   as well as the parameters specified in \code{par.set}. (Alternatively,
#'   the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument).
#'   It is called whenever a \code{\link{CPO}} is applied to
#'   a data set to prepare for transformation of the training \emph{and} prediction data.
#'   Note that this function is only used in Feature Operating CPOs created with \code{makeCPO}, and in Target Operating CPOs
#'   created with \code{makeCPOExtendedTargetOp}.
#'
#'   The behaviour of this function differs slightly in Feature Operation and Target Operation CPOs.
#'
#'   For \bold{Feature Operation CPOs}, if \code{cpo.retrafo} is \code{NULL}, this is a constructor function which must return a \dQuote{retrafo} function which
#'   will then modify (possibly new unseen) data. This retrafo function must have exactly one argument--the (new) data--and return the modified data. The format
#'   of the argument, and of the return value of the retrafo function, depends on the value of the \code{dataformat} parameter, see documentation there.
#'
#'   If \code{cpo.retrafo} is not \code{NULL}, this is a function which must return a control object.
#'   This control object returned by \code{cpo.train} will then be given as the \code{control} argument of the \code{cpo.retrafo} function, along with
#'   (possibly new unseen) data to manipulate.
#'
#'   For \bold{Target Operation CPOs}, if \code{cpo.retrafo} is \code{NULL}, \code{cpo.train.invert}
#'   (or \code{cpo.invert} if \code{constant.invert} is \code{TRUE}) must likewise be \code{NULL}.
#'   In that case \code{cpo.train}'s return value is ignored and it must define, within its namespace, two
#'   functions \code{cpo.retrafo} and \code{cpo.train.invert} (or \code{cpo.invert} if \code{constant.invert}
#'   is \code{TRUE}) which will take the place of the respective functions. \code{cpo.retrafo} must take the
#'   parameters \code{data} and \code{target}, and return the modified target \code{target} (or \code{data},
#'   depending on \code{dataformat}) data. \code{cpo.train.invert} must take a \code{data} and \code{control}
#'   argument and return either a modified control object, or a \code{cpo.invert} function.
#'   \code{cpo.invert} must have a \code{target} and \code{predict.type} argument and return the modified
#'   target data.
#'
#'   If \code{cpo.retrafo} is not \code{NULL}, \code{cpo.train.invert}
#'   (or \code{cpo.invert} if \code{constant.invert} is \code{TRUE}) must likewise be non-\code{NULL}.
#'   In that case, \code{cpo.train} must return a control object. This control object will then be
#'   given as the \code{control} argument of both \code{cpo.retrafo} and \code{cpo.train.invert}
#'   (or the \code{control.invert} argument of \code{cpo.invert} if \code{constant.invert} is \code{TRUE}).
#'
#'   This parameter may be \code{NULL}, resulting in a so-called \emph{stateless} CPO. For Target Operation CPOs created with \code{makeCPOTargetOp},
#'   \code{constant.invert} must be \code{TRUE} in this case.
#'   A stateless CPO does the same transformation for initial CPO
#'   application and subsequent prediction data transformation (e.g. taking the logarithm of numerical columns). Note that \code{cpo.retrafo}
#'   and \code{cpo.invert} should not
#'   have a \code{control} argument in a stateless CPO.
#' @param cpo.trafo [\code{function}]\cr
#'   This is a function which must have the parameters \code{data} and \code{target},
#'   as well as the parameters specified in \code{par.set}. (Alternatively,
#'   the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument).
#'   It is called whenever a \code{\link{CPO}} is applied to
#'   a data set to transform the training data, and (except for Retrafoless CPOs) to collect a control object used by other transformation functions.
#'   Note that this function is not used in \code{makeCPO}.
#'
#'   This functions primary task is to transform the given data when the \code{\link{CPO}} gets applied to training data. For Target Operating CPOs
#'   (created with \code{makeCPOExtendedTargetOp}(!)),
#'   it must return the complete transformed target column(s), unless \code{dataformat} is \dQuote{df.all} (in which case the complete, modified,
#'   \code{data.frame} must be returned) or \dQuote{task} (in which case the complete, modified, \code{Task} must be returned). It must furthermore
#'   create the control objects for \code{cpo.retrafo} and \code{cpo.invert}, or create these functins themselves, and save them in its function
#'   environment (see below). For Retrafoless CPOs
#'   (created with \code{makeCPORetrafoless}) and Feature Operation CPOs (created with \code{makeCPOExtendedTrafo}(!)), it must return the
#'   data in the same format as received it in its \code{data} argument (depending on \code{dataformat}). If \code{dataformat} is a
#'   \code{df.all} or \code{task}, this means the target column(s) contained in the \code{data.frame} or \code{Task} returned must not be modified.
#'
#'   For CPOs that are not Retrafoless, a unit of information to be carried over to the retrafo step needs to be created inside the \code{cpo.trafo}
#'   function. This unit of information is a variable that must be defined inside the environment of the \code{cpo.trafo} function and will be
#'   retrieved by the CPO framework.
#'
#'   If \code{cpo.retrafo} is not \code{NULL}
#'   the unit is an object named \dQuote{\code{control}} that will be passed on as the \code{control} argument to the
#'   \code{cpo.retrafo} function. If \code{cpo.retrafo} is \code{NULL}, the unit is a \emph{function}, called \dQuote{\code{cpo.retrafo}},
#'   that will be used
#'   \emph{instead} of the \code{cpo.retrafo}
#'   function passed over to \code{makeCPOExtendedTargetOp} / \code{makeCPOExtendedTrafo}. It must behave
#'   the same as the function it replaces, but has only the \code{data} (and \code{target}, for Target Operation CPOs) argument.
#'
#'   For Target Operation CPOs created with \code{makeCPOExtendedTargetOp}, another unit of information to be used by \code{cpo.invert}
#'   must be used. The options here are similar to \code{cpo.retrafo}: Either a control object, named \code{control.invert}, is created,
#'   or the \code{cpo.invert} function itself is given (and \code{cpo.invert} in the \code{makeCPOExtendedTargetOp} call is set to \code{NULL}),
#'   with the \code{target} and \code{predict.type} arguments.
#' @param cpo.retrafo [\code{function} | \code{NULL}]\cr
#'   This is a function which must have the parameters \code{data}, \code{target} (Target Operation CPOs only) and \code{control},
#'   as well as the parameters specified in \code{par.set}. (Alternatively,
#'   the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument).
#'   In Feature Operation CPOs created with \code{makeCPO}, if \code{cpo.train} is \code{NULL}, the \code{control} argument must be absent.
#'
#'   This function gets called during the \dQuote{retransformation} step where prediction data is given to the \code{\link{CPORetrafo}} object before it
#'   is given to a fitted machine learning model for prediction. In \code{makeCPO} Featore Operation CPOs and \code{makeCPOTargetOp} Target Operation CPOs,
#'   this is \emph{also} called during the
#'   first trafo step, where the \code{\link{CPO}} object is applied to training data.
#'
#'   In Feature Operation CPOs, this function receives the data to be
#'   transformed and must return the transformed data in the same format as it received them.
#'   The format of \code{data} is the same as the format in \code{cpo.train} and \code{cpo.trafo}, with the exception that if \code{dataformat} is
#'   \dQuote{task} or \dQuote{df.all}, the behaviour here is as if \dQuote{df.split} had been given.
#'
#'   In Target Operation CPOs created with \code{makeCPOTargetOp}, this function receives the data and target to be transformed
#'   and must return the transformed target. The input format of these parameters depends on \code{dataformat}.
#'   If \code{dataformat} is \dQuote{task} or \dQuote{df.all}, the returned value must be the modified \code{\link[mlr]{Task}} / \code{data.frame}
#'   with the feature columns not modified. Otherwise, the target values to be modified are in the \code{target} parameter, and the return
#'   value must be a \code{data.frame} of the modified target values only.
#'
#'   In Target Operation CPOs created with \code{makeCPOExtendedTargetOp}, this function is called during the retrafo step, and it must
#'   create a \code{control.invert} object in its environment to be used in the inversion step, as well as return the modified target
#'   data.The format of the data given to \code{cpo.retrafo} in Target Operation CPOs created with \code{makeCPOExtendedTargetOp} is the same
#'   as in other functions, with the exception that, if \code{dataformat} is \dQuote{df.all} or \dQuote{task}, the full \code{data.frame}
#'   or \code{\link[mlr]{Task}} will be given as the \code{target} parameter, while the \code{data} parameter will behave as if
#'   \code{dataformat} \dQuote{df.split}. Depending on what object the \code{\link{CPORetrafo}} object was applied to,
#'   the \code{target} argument \emph{may be \code{NULL}}; in that case \code{NULL} must also be returned by the function.
#'
#'   If \code{cpo.invert} is \code{NULL}, \code{cpo.retrafo} should create a \code{cpo.invert} function in its environment instead of
#'   creating the control object; this function should then take the \code{target} and \code{predict.type} arguments. If \code{constant.invert}
#'   is \code{TRUE}, this function does not need to define the \code{control.invert} or \code{cpo.invert} variables, they are instead
#'   taken from \code{cpo.trafo}.
#' @param cpo.train.invert
#'   This is a function which must have the parameters \code{data}, and \code{control},
#'   as well as the parameters specified in \code{par.set}. (Alternatively,
#'   the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument).
#'
#'   This function receives the feature columns given for prediction, and must return a
#'   control object that will be passed on to the \code{cpo.invert} function, \emph{or} it must return a \emph{function} that will be treated
#'   as the \code{cpo.invert} function if the \code{cpo.invert} argument is \code{NULL}. In the latter case, the returned function takes
#'   exactly two arguments (the prediction column to be inverted, and \code{predict.type}), and otherwise behaves identically to \code{cpo.invert}.
#'
#'   If \code{constant.invert} is \code{TRUE}, this must be \code{NULL}.
#'
#' @param cpo.invert [\code{function} | \code{NULL}]\cr
#'   This is a function which must have the parameters \code{target} (a \code{data.frame} containing the columns of a prediction made), \code{control.invert},
#'   and \code{predict.type}, as well as the parameters specified in \code{par.set}. (Alternatively,
#'   the function may have only some of these arguments and a \code{\link[methods:dotsMethods]{dotdotdot}} argument).
#'
#'   The \code{predict.type} \emph{requested} by the \code{\link[stats]{predict}} or \code{\link{invert}} call is given as a \code{character(1)} in
#'   the \code{predict.type} argument. Note that this is not necessarily the \code{predict.type} of the prediction made and given as \code{target} argument,
#'   depending on the value of \code{predict.type.map} (see there).
#'
#'   This function performs the inversion for a Target Operation CPO. It takes a control object, which summarizes information from the training and
#'   retrafo step, and the prediction as returned by a machine learning model, and undoes the operation done to the target column in the \code{cpo.trafo}
#'   function.
#'
#'   For example, if the trafo step consisted of taking the logarithm of a regression target, the \code{cpo.invert} function could return the exponentiated
#'   prediction values by taking the \code{exp} of the only column in the \code{target} \code{data.frame} and returning the result of that. This kind of
#'   operation does not need the \code{cpo.retrafo} step and should have \code{skip.retrafo} set to \code{TRUE}.
#'
#'   As a more elaborate example, a CPO could train a model on the training data and set the target values to the \emph{residues} of that trained model.
#'   The \code{cpo.retrafo} function would then make predictions with that model on the new prediction data and save the result to the \code{control} object.
#'   The \code{cpo.invert} function would then add these predictions to the predictions given to it in the \code{target} argument to \dQuote{invert} the
#'   antecedent subtraction of model predictions from target values when taking the residues.
#' @return [\code{\link{CPOConstructor}}]. A Constructor for \code{\link{CPO}}s.
#' @family CPOConstructor related
#' @family CPO lifecycle related
#' @family advanced topics
#'
#' @examples
#' # an example constant feature remover CPO
#' constFeatRem = makeCPO("constFeatRem",
#'  dataformat = "df.features",
#'  cpo.train = function(data, target) {
#'    names(Filter(function(x) {  # names of columns to keep
#'        length(unique(x)) > 1
#'      }, data))
#'    }, cpo.retrafo = function(data, control) {
#'    data[control]
#'  })
#' # alternatively:
#' constFeatRem = makeCPO("constFeatRem",
#'   dataformat = "df.features",
#'   cpo.train = function(data, target) {
#'     cols.keep = names(Filter(function(x) {
#'         length(unique(x)) > 1
#'       }, data))
#'     # the following function will do both the trafo and retrafo
#'     result = function(data) {
#'       data[cols.keep]
#'     }
#'     result
#'   }, cpo.retrafo = NULL)
#' @name makeCPO
NULL


###################
# The following is a rudiment, possibly some of this needs to be used.
# TODO: Delete if documentation is done and it turns out this is not needed.
###################
# @title Create a custom CPO constructor
#
# @description
# \code{makeCPOExtended} creates a Feature Operation CPO constructor, i.e. a constructor for a CPO that will
# operate on feature columns. \code{makeCPOTargetOp} creates a Target Operation CPO constructor, which
# creates CPOs that operate on the target column.
#
# \code{makeCPOExtended} is for advanced users and internal use; for a much simpler user-interface, use
# \code{\link{makeCPO}}.
#
# @inheritparams makeCPO
# @param ...
#   Parameters of the CPO, in the format of \code{\link[ParamHelpers]{pSS}}. These parameters are used in addition
#   to the \code{par.set} parameters.
# @param trafo.type [\code{character(1)}]\cr
#   Indicates what API is used for \code{cpo.trafo} and \code{cpo.retrafo}, and how state information is transferred
#   between them. Possibilities are:
#   \itemize{
#     \item{trafo.returns.data} \code{cpo.trafo} must be specified and is called with the training data and the CPO parameters.
#       It must return the modified data, and within its namespace must either specify a \dQuote{control} variable ("Object-Based CPO"),
#       if \code{cpo.retrafo} is given, or a \dQuote{cpo.retrafo} variable, if (the makeCPOExtended parameter) \code{cpo.retrafo}
#       is \code{NULL} ("Functional CPO"). For Object-Based CPO, \code{cpo.retrafo} is called with the \code{control} object
#       created in \code{cpo.trafo}, additionally with the new data, and the CPO parameters. For Functional CPO, \code{cpo.retrafo} is
#       constructed inside the \code{cpo.trafo} call and is used for transformation of new data. It must take a single argument and
#       return the transformed data.
#     \item{trafo.returns.control} \code{cpo.trafo} must be specified and is called with the training data and the CPO parameters. It must return
#       a \code{cpo.retrafo} function that takes the data to be transformed as a single argument, and returns the transformed data.
#       If \code{trafo.type} is \dQuote{trafo.returns.control}, \code{pco.retrafo} must be \code{NULL}.
#     \item{stateless} Specification of \code{cpo.trafo} is optional and may be \code{NULL}. If it is not given, \code{cpo.retrafo} is used on both
#       training and new data; otherwise, \code{cpo.trafo} is applied to training data, \code{cpo.retrafo} is used on predict data. There
#       is no transfer of information from trafo to retrafo. If \code{cpo.trafo} is not given, \code{dataformat} must not be \dQuote{task} or \dQuote{df.all}.
#   }
# @param .type [\code{character(1)}]\cr
#   For Target Operation CPOs, the type of task that it operates on. Must be one of \dQuote{cluster}, \dQuote{classif}, \dQuote{multilabel}, \dQuote{regr},
#   or \dQuote{surv}. If input data is a data.frame, it will be treated as a cluster task. Default is \dQuote{cluster}.
# @param .type.out [\code{character(1)}]\cr
#   For Target Operation CPOs, the type of task that will be generated by this CPO. If this is the same as \code{.type}, no conversion takes place.
#   Possible values are the same as for \code{.type}. Default is \code{.type}.
# @param predict.type [\code{character} | \code{list}]\cr
#   Must be a named \code{character}, or named \code{list} of \code{character(1)}, indicating
#   what \code{predict.type} (see \link{Prediction}) a prediction must have if the output prediction
#   is to be of some type. E.g. if a CPO converts a \dQuote{regr} \code{Task} into a
#   \dQuote{classif} \code{Task}, and if for \dQuote{se} prediction it needs a classification
#   learner to give \dQuote{prob} type predictions, while for \dQuote{response} prediction it
#   also needs \dQuote{response} predictions, this would be \code{c(response = "response",
#   se = "prob")}. The names are the prediction types that are requested from this CPO, the
#   values are types that this CPO will request from an underlying learner. If a name is not
#   present, the \code{predict.type} is assumed not supported. Default is \code{c(response = "response")}.
# @param data.dependent [\code{logical(1)}]\cr
#   Whether to make a data-dependent inverter CPO. If this is \code{FALSE}, the \code{cpo.trafo} function does not have
#   a \code{data} parameter.
# @param cpo.trafo [\code{language} | \code{function} | \code{NULL}]\cr
#   This can either be a function, or just the function body wrapped in curly braces.
#   If this is a function, it must have the parameters \dQuote{data} and \dQuote{target},
#   as well as the parameters specified in \dQuote{...} or \dQuote{par.set}. (Alternatively,
#   the function may have a dotdotdot argument). Depending on the values of \code{trafo.type} and
#   \code{dataformat} -- see there --, it must return a \dQuote{data.frame}, a \dQuote{task},
#   a dQuote{matrix}, \dQuote{list} of \dQuote{data.frame} and \dQuote{matrix} objects, or a retrafo function.
#
#   If \dQuote{cpo.retrafo} is given and \code{trafo.type} is \dQuote{trafo.returns.data}, it must create a \dQuote{control}
#   variable in its namespace, which will be passed on to \dQuote{cpo.retrafo}. If \dQuote{cpo.retrafo} is
#   not given and \code{trafo.type} is \dQuote{trafo.returns.data}, it must create a \dQuote{cpo.retrafo} function within its namespace, which will be called
#   for re-transformation.
#
#   If \code{trafo.type} is \dQuote{trafo.returns.control}, this function must return a \dQuote{cpo.retrafo} function.
#
#   If \code{trafo.type} is
#   \dQuote{stateless}, this argument may be \code{NULL}, or a function which just returns the transformed data.
#
#   If \dQuote{cpo.trafo} is a list of expressions (preferred), it is turned into a function by mlr, with the correct function arguments.
# @param cpo.retrafo [\code{language} | \code{function}]\cr
#   Similarly to \dQuote{cpo.trafo}, this is either a function, the function body in curly braces (preferred), or \code{NULL}.
#   If this is not \code{NULL}, this function must have the same arguments as \code{cpo.trafo}, with the exception that
#   the \dQuote{target} argument is replaced by a \dQuote{control} argument, which will be
#   the value created in the \dQuote{cpo.trafo} run. It gets its input data in the same format as
#   \dQuote{cpo.trafo}, with the exception that if \dQuote{dataformat} is \dQuote{task}, it gets a
#   \dQuote{data.frame} as if \dQuote{dataformat} were \dQuote{df.all}. This function must similarly return an
#   object in the same format as it received as input.
#
# @family CPO
# @export
#
# @examples
# # an example 'pca' CPO
# # demonstrates the (object based) "trafo.returns.data" CPO API
# pca = makeCPOExtended("pca",  # name
#   center = TRUE: logical,  # one logical parameter 'center'
#   dataformat= "numeric",  # only handle numeric columns
#   trafo.type = "trafo.returns.data",  # default, can be omitted
#   # cpo.trafo is given as a function body. The function head is added
#   # automatically, containing 'data', 'target', and 'center'
#   # (since a 'center' parameter was defined)
#   cpo.trafo = {
#     pcr = prcomp(as.matrix(data), center = center)
#     # The following line creates a 'control' object, which will be given
#     # to retrafo.
#     control = list(rotation = pcr$rotation, center = pcr$center)
#     pcr$x  # returning a matrix is ok
#   # Just like cpo.trafo, cpo.retrafo is a function body, with implicit
#   # arguments 'data', 'control', and 'center'.
#   }, cpo.retrafo = {
#     scale(as.matrix(data), center = control$center, scale = FALSE) %*%
#       control$rotation
#   })
#
# # an example 'scale' CPO
# # demonstrates the (functional) "trafo.returns.data" CPO API
# scaleCPO = makeCPOExtended("scale",
#   dataformat = "numeric",
#   # trafo.type = "trafo.returns.data" is implicit
#   cpo.trafo = function(data, target) {
#     result = scale(as.matrix(data), center = center, scale = scale)
#     cpo.retrafo = function(data) {
#       # here we can use the 'result' object generated in cpo.trafo
#       scale(as.matrix(data), attr(result, "scaled:center"),
#         attr(result, "scaled:scale"))
#     }
#     result
#   }, cpo.retrafo = NULL)  # don't forget to set it cpo.retrafo to NULL
#
# # an example constant feature remover CPO
# # demonstrates the "trafo.returns.control" CPO API
# constFeatRem = makeCPOExtended("constFeatRem",
#   dataformat = "df.features",
#   trafo.type = "trafo.returns.control",
#   cpo.trafo = function(data, target) {
#     cols.keep = names(Filter(function(x) {
#         length(unique(x)) > 1
#       }, data))
#     # the following function will do both the trafo and retrafo
#     result = function(data) {
#       data[cols.keep]
#     }
#     result
#   }, cpo.retrafo = NULL)
#
# # an example 'square' CPO
# # demonstrates the "stateless" CPO API
# square = makeCPOExtended("scale",
#   dataformat = "numeric",
#   trafo.type = "stateless",
#   cpo.trafo = function(data) {
#     as.matrix(data) * 2
#   }, cpo.retrafo = NULL) # optional, we don't need it since trafo & retrafo same
#
mlr-org/mlrCPO documentation built on Nov. 18, 2022, 11:25 p.m.