cpoApplyFunRegrTarget: Transform a Regression Target Variable
In mlrCPO: Composable Preprocessing Operators and Pipelines for Machine Learning

cpoApplyFunRegrTarget

R Documentation

Transform a Regression Target Variable

Description

This is a CPOConstructor to be used to create a CPO. It is called like any R function and returns the created CPO.

Apply a given function to the target column of a regression Task.

Usage

cpoApplyFunRegrTarget(
  trafo,
  invert.response = NULL,
  invert.se = NULL,
  param = NULL,
  vectorize = TRUE,
  gauss.points = 23,
  id,
  export = "export.default",
  affect.type = NULL,
  affect.index = integer(0),
  affect.names = character(0),
  affect.pattern = NULL,
  affect.invert = FALSE,
  affect.pattern.ignore.case = FALSE,
  affect.pattern.perl = FALSE,
  affect.pattern.fixed = FALSE
)

Arguments

`trafo`	[`function`] A function transforming the target column. If `vectorize` is `TRUE`, the argument is a vector of the whole column, `trafo` must vectorize over it and return a vector of the same length; otherwise, the function gets called once for every data item, and both the function argument and the return value must have length 1. The function must take one or two arguments. If it takes two arguments, the second argument will be `param`.
`invert.response`	[`function`] If a model is trained on data that was transformed by `trafo`, this function should invert a prediction made by this model back to the space of the original data. In most cases, this will be the inverse of `trafo`, so that `invert.response(trafo(x)) == x`. Similarly to `trafo`, this function takes / produces single elements or the whole column, depending on `vectorize`. The return value should be a `numeric` in both cases. This can also be `NULL`, in which case using this `CPO` for `invert` with `predict.type = "response"` is not possible. Default is `NULL`.
`invert.se`	[`function`] Similarly to `invert.response`, this is a function that inverts a `"se"` prediction made after training on `trafo`'d data. This function should take at least two arguments, `mean` and `se`, and return a numeric vector of length 2 if `vectorize` is `FALSE`, or a `data.frame` or `matrix` with two numeric columns if `vectorize` is `TRUE`. The function may also take a third argument, which will be set to `param`. `invert.se` may also be `NULL`, in which case “se” inversion is done by numeric integration using Gauss-Hermite quadrature. Default is `NULL`.
`param`	[any] Optional argument to be given to `trafo` and / or `invert`. If both of them only take one argument, this is ignored. Default is `NULL`.
`vectorize`	[`logical(1)`] Whether to call `trafo`, `invert.response` and `invert.se` once with the whole data column (or response and se column if `predict.type == "se"`), or once for each element. If the functions vectorize, it is recommended to have this set to `TRUE` for better performance. Default is `TRUE`.
`gauss.points`	[`numeric(1)`] Number of points at which to evaluate `invert.response` for Gauss-Hermite quadrature integration. Only used if `invert.se` is `NULL`. Default is `23`.
`id`	[`character(1)`] id to use as prefix for the CPO's hyperparameters. this must be used to avoid name clashes when composing two CPOs of the same type, or with learners or other CPOS with hyperparameters with clashing names.
`export`	[`character`] Either a character vector indicating the parameters to export as hyperparameters, or one of the special values “export.all” (export all parameters), “export.default” (export all parameters that are exported by default), “export.set” (export all parameters that were set during construction), “export.default.set” (export the intersection of the “default” and “set” parameters), “export.unset” (export all parameters that were not set during construction) or “export.default.unset” (export the intersection of the “default” and “unset” parameters). Default is “export.default”.
`affect.type`	[`character` \| `NULL`] Type of columns to affect. A subset of “numeric”, “factor”, “ordered”, “other”, or `NULL` to not match by column type. Default is `NULL`.
`affect.index`	[`numeric`] Indices of feature columns to affect. The order of indices given is respected. Target column indices are not counted (since target columns are always included). Default is `integer(0)`.
`affect.names`	[`character`] Feature names of feature columns to affect. The order of names given is respected. Default is `character(0)`.
`affect.pattern`	[`character(1)` \| `NULL`] `grep` pattern to match feature names by. Default is `NULL` (no pattern matching)
`affect.invert`	[`logical(1)`] Whether to affect all features not matched by other `affect.*` parameters.
`affect.pattern.ignore.case`	[`logical(1)`] Ignore case when matching features with `affect.pattern`; see `grep`. Default is `FALSE`.
`affect.pattern.perl`	[`logical(1)`] Use Perl-style regular expressions for `affect.pattern`; see `grep`. Default is `FALSE`.
`affect.pattern.fixed`	[`logical(1)`] Use fixed matching instead of regular expressions for `affect.pattern`; see `grep`. Default is `FALSE`.

Value

[CPO].

Details

When both mean and se prediction is available, it may be possible to make more accurate mean inversion than for the response predict.type, using integrals or approximations like the delta method. In such cases it may be advisable to prepend this CPO with the cpoResponseFromSE CPO.

Note when trafo or invert.response take more than one argument, the second argument will be set to the value of param. This may lead to unexpected results when using functions with rarely used parameters, e.g. log. In these cases, it may be necessary to wrap the function: trafo = function(x) log(x).

General CPO info

This function creates a CPO object, which can be applied to Tasks, data.frames, link[mlr]{Learner}s and other CPO objects using the %>>% operator.

The parameters of this object can be changed after creation using the function setHyperPars. The other hyper-parameter manipulating functins, getHyperPars and getParamSet similarly work as one expects.

If the “id” parameter is given, the hyperparameters will have this id as aprefix; this will, however, not change the parameters of the creator function.

Calling a `CPOConstructor`

CPO constructor functions are called with optional values of parameters, and additional “special” optional values. The special optional values are the id parameter, and the affect.* parameters. The affect.* parameters enable the user to control which subset of a given dataset is affected. If no affect.* parameters are given, all data features are affected by default.

Other CPOs: cpoApplyFun(), cpoAsNumeric(), cpoCache(), cpoCbind(), cpoCollapseFact(), cpoDropConstants(), cpoDropMostlyConstants(), cpoDummyEncode(), cpoFilterAnova(), cpoFilterCarscore(), cpoFilterChiSquared(), cpoFilterFeatures(), cpoFilterGainRatio(), cpoFilterInformationGain(), cpoFilterKruskal(), cpoFilterLinearCorrelation(), cpoFilterMrmr(), cpoFilterOneR(), cpoFilterPermutationImportance(), cpoFilterRankCorrelation(), cpoFilterRelief(), cpoFilterRfCImportance(), cpoFilterRfImportance(), cpoFilterRfSRCImportance(), cpoFilterSymmetricalUncertainty(), cpoFilterUnivariate(), cpoFilterVariance(), cpoFixFactors(), cpoIca(), cpoImpactEncodeClassif(), cpoImpactEncodeRegr(), cpoImpute(), cpoImputeConstant(), cpoImputeHist(), cpoImputeLearner(), cpoImputeMax(), cpoImputeMean(), cpoImputeMedian(), cpoImputeMin(), cpoImputeMode(), cpoImputeNormal(), cpoImputeUniform(), cpoLogTrafoRegr(), cpoMakeCols(), cpoMissingIndicators(), cpoModelMatrix(), cpoOversample(), cpoPca(), cpoProbEncode(), cpoQuantileBinNumerics(), cpoRegrResiduals(), cpoResponseFromSE(), cpoSample(), cpoScale(), cpoScaleMaxAbs(), cpoScaleRange(), cpoSelect(), cpoSmote(), cpoSpatialSign(), cpoTransformParams(), cpoWrap(), makeCPOCase(), makeCPOMultiplex()