cpoRegrResiduals: Train a Model on a Task and Return the Residual Task
In mlr-org/mlrCPO: Composable Preprocessing Operators and Pipelines for Machine Learning

cpoRegrResiduals

R Documentation

Train a Model on a Task and Return the Residual Task

Description

This is a CPOConstructor to be used to create a CPO. It is called like any R function and returns the created CPO.

Given a regression learner, this CPO fits the learner to a regression Task and replaces the regression target with the residuals–the differences of the target values and the model's predictions–of the model.

For inversion, the predictions of the model for the prediction data are added to the predictions to be inverted.

If predict.se is TRUE, property.type == "se" inversion can also be performed. In that case, the se of the incoming prediction and the se of the internal model are assumed to be independently distributed, and the resulting se is the pythagorean sum of the ses.

Usage

cpoRegrResiduals(
  learner,
  predict.se = FALSE,
  crr.train.residuals = "plain",
  crr.resampling = cv5,
  id,
  export = "export.default",
  affect.type = NULL,
  affect.index = integer(0),
  affect.names = character(0),
  affect.pattern = NULL,
  affect.invert = FALSE,
  affect.pattern.ignore.case = FALSE,
  affect.pattern.perl = FALSE,
  affect.pattern.fixed = FALSE
)

Arguments

`learner`	[`character(1)` \| `Learner`] A regression `Learner`, or a `character(1)` identifying a `Learner` to be constructed.
`predict.se`	[`logical(1)`] Whether to fit the model with “se” predict type. This enables the resulting `CPOInverter` to be used for `property.type == "se"` inversion. Default is `FALSE`.
`crr.train.residuals`	[`character(1)`] What residuals to use for training (i.e. initial transformation). One of “resample”, “oob”, “plain”. If “resample” is given, the out-of-resampling-fold predictions are used when resampling according to the `resampling` parameter. If “oob” is used, the `Learner` must have the “oobpreds” property; the out-of-bag predictions are then used. If `train.residuals` is “plain”, the simple regression residuals are used. “plain” may offer slightly worse performance than the alternatives, but few `mlr` `Learners` support “oobpreds”, and “resample” can come at a considerable run time penalty. Default is “plain”.
`crr.resampling`	[`ResampleDesc` \| `ResampleInstance`] What resampling to use when `train.residuals` is “resample”; otherwise has no effect. The `$predict` slot of the resample description will be ignored and set to `test`. If a data point is predicted by multiple resampling folds, the average residual is used. If a data point is not predicted by any resampling fold, the “plain” residual is used for that one. Default is `cv5`.
`id`	[`character(1)`] id to use as prefix for the CPO's hyperparameters. this must be used to avoid name clashes when composing two CPOs of the same type, or with learners or other CPOS with hyperparameters with clashing names.
`export`	[`character`] Either a character vector indicating the parameters to export as hyperparameters, or one of the special values “export.all” (export all parameters), “export.default” (export all parameters that are exported by default), “export.set” (export all parameters that were set during construction), “export.default.set” (export the intersection of the “default” and “set” parameters), “export.unset” (export all parameters that were not set during construction) or “export.default.unset” (export the intersection of the “default” and “unset” parameters). Default is “export.default”.
`affect.type`	[`character` \| `NULL`] Type of columns to affect. A subset of “numeric”, “factor”, “ordered”, “other”, or `NULL` to not match by column type. Default is `NULL`.
`affect.index`	[`numeric`] Indices of feature columns to affect. The order of indices given is respected. Target column indices are not counted (since target columns are always included). Default is `integer(0)`.
`affect.names`	[`character`] Feature names of feature columns to affect. The order of names given is respected. Default is `character(0)`.
`affect.pattern`	[`character(1)` \| `NULL`] `grep` pattern to match feature names by. Default is `NULL` (no pattern matching)
`affect.invert`	[`logical(1)`] Whether to affect all features not matched by other `affect.*` parameters.
`affect.pattern.ignore.case`	[`logical(1)`] Ignore case when matching features with `affect.pattern`; see `grep`. Default is `FALSE`.
`affect.pattern.perl`	[`logical(1)`] Use Perl-style regular expressions for `affect.pattern`; see `grep`. Default is `FALSE`.
`affect.pattern.fixed`	[`logical(1)`] Use fixed matching instead of regular expressions for `affect.pattern`; see `grep`. Default is `FALSE`.

Value

[CPO].

CPOTrained State

The CPORetrafo state's $control slot is the WrappedModel created when training the learner on the given data.

The CPOInverter state's $control slot is a data.frame of the “response” and (if predict.se is TRUE) “se” columns of the prediction done by the model on the data.

General CPO info

This function creates a CPO object, which can be applied to Tasks, data.frames, link{Learner}s and other CPO objects using the %>>% operator.

The parameters of this object can be changed after creation using the function setHyperPars. The other hyper-parameter manipulating functins, getHyperPars and getParamSet similarly work as one expects.

If the “id” parameter is given, the hyperparameters will have this id as aprefix; this will, however, not change the parameters of the creator function.

Calling a `CPOConstructor`

CPO constructor functions are called with optional values of parameters, and additional “special” optional values. The special optional values are the id parameter, and the affect.* parameters. The affect.* parameters enable the user to control which subset of a given dataset is affected. If no affect.* parameters are given, all data features are affected by default.

Other CPOs: cpoApplyFunRegrTarget(), cpoApplyFun(), cpoAsNumeric(), cpoCache(), cpoCbind(), cpoCollapseFact(), cpoDropConstants(), cpoDropMostlyConstants(), cpoDummyEncode(), cpoFilterAnova(), cpoFilterCarscore(), cpoFilterChiSquared(), cpoFilterFeatures(), cpoFilterGainRatio(), cpoFilterInformationGain(), cpoFilterKruskal(), cpoFilterLinearCorrelation(), cpoFilterMrmr(), cpoFilterOneR(), cpoFilterPermutationImportance(), cpoFilterRankCorrelation(), cpoFilterRelief(), cpoFilterRfCImportance(), cpoFilterRfImportance(), cpoFilterRfSRCImportance(), cpoFilterRfSRCMinDepth(), cpoFilterSymmetricalUncertainty(), cpoFilterUnivariate(), cpoFilterVariance(), cpoFixFactors(), cpoIca(), cpoImpactEncodeClassif(), cpoImpactEncodeRegr(), cpoImputeConstant(), cpoImputeHist(), cpoImputeLearner(), cpoImputeMax(), cpoImputeMean(), cpoImputeMedian(), cpoImputeMin(), cpoImputeMode(), cpoImputeNormal(), cpoImputeUniform(), cpoImpute(), cpoLogTrafoRegr(), cpoMakeCols(), cpoMissingIndicators(), cpoModelMatrix(), cpoOversample(), cpoPca(), cpoProbEncode(), cpoQuantileBinNumerics(), cpoResponseFromSE(), cpoSample(), cpoScaleMaxAbs(), cpoScaleRange(), cpoScale(), cpoSelect(), cpoSmote(), cpoSpatialSign(), cpoTransformParams(), cpoWrap(), makeCPOCase(), makeCPOMultiplex()