View source: R/03_S7_Preprocessor.R
setup_Preprocessor | R Documentation |
PreprocessorParameters
Setup PreprocessorParameters
setup_Preprocessor(
complete_cases = FALSE,
remove_features_thres = NULL,
remove_cases_thres = NULL,
missingness = FALSE,
impute = FALSE,
impute_type = c("missRanger", "micePMM", "meanMode"),
impute_missRanger_params = list(pmm.k = 3, maxiter = 10, num.trees = 500),
impute_discrete = "get_mode",
impute_continuous = "mean",
integer2factor = FALSE,
integer2numeric = FALSE,
logical2factor = FALSE,
logical2numeric = FALSE,
numeric2factor = FALSE,
numeric2factor_levels = NULL,
numeric_cut_n = 0,
numeric_cut_labels = FALSE,
numeric_quant_n = 0,
numeric_quant_NAonly = FALSE,
unique_len2factor = 0,
character2factor = FALSE,
factorNA2missing = FALSE,
factorNA2missing_level = "missing",
factor2integer = FALSE,
factor2integer_startat0 = TRUE,
scale = FALSE,
center = scale,
scale_centers = NULL,
scale_coefficients = NULL,
remove_constants = FALSE,
remove_constants_skip_missing = TRUE,
remove_features = NULL,
remove_duplicates = FALSE,
one_hot = FALSE,
one_hot_levels = NULL,
add_date_features = FALSE,
date_features = c("weekday", "month", "year"),
add_holidays = FALSE,
exclude = NULL
)
complete_cases |
Logical: If TRUE, only retain complete cases (no missing data). |
remove_features_thres |
Float (0, 1): Remove features with missing values in >= to this fraction of cases. |
remove_cases_thres |
Float (0, 1): Remove cases with >= to this fraction of missing features. |
missingness |
Logical: If TRUE, generate new boolean columns for each feature with missing values, indicating which cases were missing data. |
impute |
Logical: If TRUE, impute missing cases. See |
impute_type |
Character: Package to use for imputation. |
impute_missRanger_params |
Named list with elements "pmm.k" and
"maxiter", which are passed to |
impute_discrete |
Character: Name of function that returns single value: How to impute
discrete variables for |
impute_continuous |
Character: Name of function that returns single value: How to impute
continuous variables for |
integer2factor |
Logical: If TRUE, convert all integers to factors. This includes
|
integer2numeric |
Logical: If TRUE, convert all integers to numeric
(will only work if |
logical2factor |
Logical: If TRUE, convert all logical variables to factors. |
logical2numeric |
Logical: If TRUE, convert all logical variables to numeric. |
numeric2factor |
Logical: If TRUE, convert all numeric variables to factors. |
numeric2factor_levels |
Character vector: Optional - will be passed to
|
numeric_cut_n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric_cut_labels |
Logical: The |
numeric_quant_n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric_quant_NAonly |
Logical: If TRUE, only bin numeric variables with missing values. |
unique_len2factor |
Integer (>=2): Convert all variables with less
than or equal to this number of unique values to factors.
For example, if binary variables are encoded with 1, 2, you could use
|
character2factor |
Logical: If TRUE, convert all character variables to factors. |
factorNA2missing |
Logical: If TRUE, make NA values in factors be of
level |
factorNA2missing_level |
Character: Name of level if
|
factor2integer |
Logical: If TRUE, convert all factors to integers. |
factor2integer_startat0 |
Logical: If TRUE, start integer coding at 0. |
scale |
Logical: If TRUE, scale columns of |
center |
Logical: If TRUE, center columns of |
scale_centers |
Named vector: Centering values for each feature. |
scale_coefficients |
Named vector: Scaling values for each feature. |
remove_constants |
Logical: If TRUE, remove constant columns. |
remove_constants_skip_missing |
Logical: If TRUE, skip missing values, before checking if feature is constant. |
remove_features |
Character vector: Features to remove. |
remove_duplicates |
Logical: If TRUE, remove duplicate cases. |
one_hot |
Logical: If TRUE, convert all factors using one-hot encoding. |
one_hot_levels |
List: Named list of the form "feature_name" = "levels". Used when applying
one-hot encoding to validation or test data using |
add_date_features |
Logical: If TRUE, extract date features from date columns. |
date_features |
Character vector: Features to extract from dates. |
add_holidays |
Logical: If TRUE, extract holidays from date columns. |
exclude |
Integer, vector: Exclude these columns from preprocessing. |
PreprocessorParameters
object.
EDG
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.