cpoImputeNormal: Perform Imputation with Normally Distributed Random Values
In mlrCPO: Composable Preprocessing Operators and Pipelines for Machine Learning

cpoImputeNormal

R Documentation

Perform Imputation with Normally Distributed Random Values

Description

Allows imputation of missing feature values through various techniques. Note that you have the possibility to re-impute a data set in the same way as the imputation was performed during training. This especially comes in handy during resampling when one wants to perform the same imputation on the test set as on the training set.

Usage

cpoImputeNormal(
  mu = NA_real_,
  sd = NA_real_,
  impute.new.levels = TRUE,
  recode.factor.levels = TRUE,
  id,
  export = "export.default",
  affect.type = NULL,
  affect.index = integer(0),
  affect.names = character(0),
  affect.pattern = NULL,
  affect.invert = FALSE,
  affect.pattern.ignore.case = FALSE,
  affect.pattern.perl = FALSE,
  affect.pattern.fixed = FALSE
)

Arguments

`mu`	[`numeric(1)`] Mean of normal distribution. If missing it will be estimated from the data.
`sd`	[`numeric(1)`] Standard deviation of normal distribution. If missing it will be estimated from the data.
`impute.new.levels`	[`logical(1)`] If new, unencountered factor level occur during reimputation, should these be handled as NAs and then be imputed the same way? Default is `TRUE`.
`recode.factor.levels`	[`logical(1)`] Recode factor levels after reimputation, so they match the respective element of `lvls` (in the description object) and therefore match the levels of the feature factor in the training data after imputation?. Default is `TRUE`.
`id`	[`character(1)`] id to use as prefix for the CPO's hyperparameters. this must be used to avoid name clashes when composing two CPOs of the same type, or with learners or other CPOS with hyperparameters with clashing names.
`export`	[`character`] Either a character vector indicating the parameters to export as hyperparameters, or one of the special values “export.all” (export all parameters), “export.default” (export all parameters that are exported by default), “export.set” (export all parameters that were set during construction), “export.default.set” (export the intersection of the “default” and “set” parameters), “export.unset” (export all parameters that were not set during construction) or “export.default.unset” (export the intersection of the “default” and “unset” parameters). Default is “export.default”.
`affect.type`	[`character` \| `NULL`] Type of columns to affect. A subset of “numeric”, “factor”, “ordered”, “other”, or `NULL` to not match by column type. Default is `NULL`.
`affect.index`	[`numeric`] Indices of feature columns to affect. The order of indices given is respected. Target column indices are not counted (since target columns are always included). Default is `integer(0)`.
`affect.names`	[`character`] Feature names of feature columns to affect. The order of names given is respected. Default is `character(0)`.
`affect.pattern`	[`character(1)` \| `NULL`] `grep` pattern to match feature names by. Default is `NULL` (no pattern matching)
`affect.invert`	[`logical(1)`] Whether to affect all features not matched by other `affect.*` parameters.
`affect.pattern.ignore.case`	[`logical(1)`] Ignore case when matching features with `affect.pattern`; see `grep`. Default is `FALSE`.
`affect.pattern.perl`	[`logical(1)`] Use Perl-style regular expressions for `affect.pattern`; see `grep`. Default is `FALSE`.
`affect.pattern.fixed`	[`logical(1)`] Use fixed matching instead of regular expressions for `affect.pattern`; see `grep`. Default is `FALSE`.

Details

The description object contains these slots

target [character]: See argument.
features [character]: Feature names (column names of data).

classes [character]: Feature classes (storage type of data).
lvls [named list]: Mapping of column names of factor features to their levels, including newly created ones during imputation.
impute [named list]: Mapping of column names to imputation functions.
dummies [named list]: Mapping of column names to imputation functions.
impute.new.levels [logical(1)]: See argument.
recode.factor.levels [logical(1)]: See argument.

Value

[CPO].

General CPO info

This function creates a CPO object, which can be applied to Tasks, data.frames, link[mlr]{Learner}s and other CPO objects using the %>>% operator.

The parameters of this object can be changed after creation using the function setHyperPars. The other hyper-parameter manipulating functins, getHyperPars and getParamSet similarly work as one expects.

If the “id” parameter is given, the hyperparameters will have this id as aprefix; this will, however, not change the parameters of the creator function.

Calling a `CPOConstructor`

CPO constructor functions are called with optional values of parameters, and additional “special” optional values. The special optional values are the id parameter, and the affect.* parameters. The affect.* parameters enable the user to control which subset of a given dataset is affected. If no affect.* parameters are given, all data features are affected by default.

Other CPOs: cpoApplyFun(), cpoApplyFunRegrTarget(), cpoAsNumeric(), cpoCache(), cpoCbind(), cpoCollapseFact(), cpoDropConstants(), cpoDropMostlyConstants(), cpoDummyEncode(), cpoFilterAnova(), cpoFilterCarscore(), cpoFilterChiSquared(), cpoFilterFeatures(), cpoFilterGainRatio(), cpoFilterInformationGain(), cpoFilterKruskal(), cpoFilterLinearCorrelation(), cpoFilterMrmr(), cpoFilterOneR(), cpoFilterPermutationImportance(), cpoFilterRankCorrelation(), cpoFilterRelief(), cpoFilterRfCImportance(), cpoFilterRfImportance(), cpoFilterRfSRCImportance(), cpoFilterSymmetricalUncertainty(), cpoFilterUnivariate(), cpoFilterVariance(), cpoFixFactors(), cpoIca(), cpoImpactEncodeClassif(), cpoImpactEncodeRegr(), cpoImpute(), cpoImputeConstant(), cpoImputeHist(), cpoImputeLearner(), cpoImputeMax(), cpoImputeMean(), cpoImputeMedian(), cpoImputeMin(), cpoImputeMode(), cpoImputeUniform(), cpoLogTrafoRegr(), cpoMakeCols(), cpoMissingIndicators(), cpoModelMatrix(), cpoOversample(), cpoPca(), cpoProbEncode(), cpoQuantileBinNumerics(), cpoRegrResiduals(), cpoResponseFromSE(), cpoSample(), cpoScale(), cpoScaleMaxAbs(), cpoScaleRange(), cpoSelect(), cpoSmote(), cpoSpatialSign(), cpoTransformParams(), cpoWrap(), makeCPOCase(), makeCPOMultiplex()