setup.preprocess: Set preprocess parameters for train_cv '.preprocess' argument
In egenn/rtemis: Machine Learning and Visualization

setup.preprocess

R Documentation

Set preprocess parameters for train_cv `.preprocess` argument

Description

Set preprocess parameters for train_cv .preprocess argument

Usage

setup.preprocess(
  completeCases = FALSE,
  removeCases.thres = NULL,
  removeFeatures.thres = NULL,
  impute = FALSE,
  impute.type = "missRanger",
  impute.missRanger.params = list(pmm.k = 0, maxiter = 10),
  impute.discrete = get_mode,
  impute.numeric = mean,
  integer2factor = FALSE,
  integer2numeric = FALSE,
  logical2factor = FALSE,
  logical2numeric = FALSE,
  numeric2factor = FALSE,
  numeric2factor.levels = NULL,
  numeric.cut.n = 0,
  numeric.cut.labels = FALSE,
  numeric.quant.n = 0,
  character2factor = FALSE,
  scale = FALSE,
  center = FALSE,
  removeConstants = TRUE,
  oneHot = FALSE,
  exclude = NULL
)

Arguments

`completeCases`	Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE
`removeCases.thres`	Float (0, 1): Remove cases with >= to this fraction of missing features.
`removeFeatures.thres`	Float (0, 1): Remove features with missing values in >= to this fraction of cases.
`impute`	Logical: If TRUE, impute missing cases. See `impute.discrete` and `impute.numeric` for how
`impute.type`	Character: How to impute data: "missRanger" and "missForest" use the packages of the same name to impute by iterative random forest regression. "rfImpute" uses `randomForest::rfImpute` (see its documentation), "meanMode" will use mean and mode by default or any custom function defined in `impute.discrete` and `impute.numeric`. Default = "missRanger" (which is much faster than "missForest"). "missForest" is included for compatibility with older pipelines.
`impute.missRanger.params`	Named list with elements "pmm.k" and "maxiter", which are passed to `missRanger::missRanger`. `pmm.k` greater than 0 results in predictive mean matching. Default `pmm.k = 3` `maxiter = 10` `num.trees = 500`. Reduce `num.trees` for faster imputation especially in large datasets. Set `pmm.k = 0` to disable predictive mean matching to `missForest::missForest`
`impute.discrete`	Function that returns single value: How to impute discrete variables for `impute.type = "meanMode"`. Default = get_mode
`impute.numeric`	Function that returns single value: How to impute continuous variables for `impute.type = "meanMode"`. Default = `mean`
`integer2factor`	Logical: If TRUE, convert all integers to factors. This includes `bit64::integer64` columns
`integer2numeric`	Logical: If TRUE, convert all integers to numeric (will only work if `integer2factor = FALSE`) This includes `bit64::integer64` columns
`logical2factor`	Logical: If TRUE, convert all logical variables to factors
`logical2numeric`	Logical: If TRUE, convert all logical variables to numeric
`numeric2factor`	Logical: If TRUE, convert all numeric variables to factors
`numeric2factor.levels`	Character vector: Optional - will be passed to `levels` arg of `factor()` if `numeric2factor = TRUE` (For advanced/ specific use cases; need to know unique values of numeric vector(s) and given all numeric vars have same unique values)
`numeric.cut.n`	Integer: If > 0, convert all numeric variables to factors by binning using `base::cut` with `breaks` equal to this number
`numeric.cut.labels`	Logical: The `labels` argument of base::cut
`numeric.quant.n`	Integer: If > 0, convert all numeric variables to factors by binning using `base::cut` with `breaks` equal to this number of quantiles produced using `stats::quantile`
`character2factor`	Logical: If TRUE, convert all character variables to factors
`scale`	Logical: If TRUE, scale columns of `x`
`center`	Logical: If TRUE, center columns of `x`. Note that by default it is the same as `scale`
`removeConstants`	Logical: If TRUE, remove constant columns.
`oneHot`	Logical: If TRUE, convert all factors using one-hot encoding.
`exclude`	Integer, vector: Exclude these columns from preprocessing.