setup.preprocess: Set preprocess parameters for train_cv '.preprocess' argument

View source: R/setup.R

setup.preprocessR Documentation

Set preprocess parameters for train_cv .preprocess argument

Description

Set preprocess parameters for train_cv .preprocess argument

Usage

setup.preprocess(
  completeCases = FALSE,
  removeCases.thres = NULL,
  removeFeatures.thres = NULL,
  impute = FALSE,
  impute.type = "missRanger",
  impute.missRanger.params = list(pmm.k = 0, maxiter = 10),
  impute.discrete = get_mode,
  impute.numeric = mean,
  integer2factor = FALSE,
  integer2numeric = FALSE,
  logical2factor = FALSE,
  logical2numeric = FALSE,
  numeric2factor = FALSE,
  numeric2factor.levels = NULL,
  numeric.cut.n = 0,
  numeric.cut.labels = FALSE,
  numeric.quant.n = 0,
  character2factor = FALSE,
  scale = FALSE,
  center = FALSE,
  removeConstants = TRUE,
  oneHot = FALSE,
  exclude = NULL
)

Arguments

completeCases

Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE

removeCases.thres

Float (0, 1): Remove cases with >= to this fraction of missing features.

removeFeatures.thres

Float (0, 1): Remove features with missing values in >= to this fraction of cases.

impute

Logical: If TRUE, impute missing cases. See impute.discrete and impute.numeric for how

impute.type

Character: How to impute data: "missRanger" and "missForest" use the packages of the same name to impute by iterative random forest regression. "rfImpute" uses randomForest::rfImpute (see its documentation), "meanMode" will use mean and mode by default or any custom function defined in impute.discrete and impute.numeric. Default = "missRanger" (which is much faster than "missForest"). "missForest" is included for compatibility with older pipelines.

impute.missRanger.params

Named list with elements "pmm.k" and "maxiter", which are passed to missRanger::missRanger. pmm.k greater than 0 results in predictive mean matching. Default pmm.k = 3 maxiter = 10 num.trees = 500. Reduce num.trees for faster imputation especially in large datasets. Set pmm.k = 0 to disable predictive mean matching to missForest::missForest

impute.discrete

Function that returns single value: How to impute discrete variables for impute.type = "meanMode". Default = get_mode

impute.numeric

Function that returns single value: How to impute continuous variables for impute.type = "meanMode". Default = mean

integer2factor

Logical: If TRUE, convert all integers to factors. This includes bit64::integer64 columns

integer2numeric

Logical: If TRUE, convert all integers to numeric (will only work if integer2factor = FALSE) This includes bit64::integer64 columns

logical2factor

Logical: If TRUE, convert all logical variables to factors

logical2numeric

Logical: If TRUE, convert all logical variables to numeric

numeric2factor

Logical: If TRUE, convert all numeric variables to factors

numeric2factor.levels

Character vector: Optional - will be passed to levels arg of factor() if numeric2factor = TRUE (For advanced/ specific use cases; need to know unique values of numeric vector(s) and given all numeric vars have same unique values)

numeric.cut.n

Integer: If > 0, convert all numeric variables to factors by binning using base::cut with breaks equal to this number

numeric.cut.labels

Logical: The labels argument of base::cut

numeric.quant.n

Integer: If > 0, convert all numeric variables to factors by binning using base::cut with breaks equal to this number of quantiles produced using stats::quantile

character2factor

Logical: If TRUE, convert all character variables to factors

scale

Logical: If TRUE, scale columns of x

center

Logical: If TRUE, center columns of x. Note that by default it is the same as scale

removeConstants

Logical: If TRUE, remove constant columns.

oneHot

Logical: If TRUE, convert all factors using one-hot encoding

exclude

Integer, vector: Exclude these columns from preprocessing.


egenn/rtemis documentation built on May 4, 2024, 7:40 p.m.