impute_basic: Imputes the dataset with median of column for numeric data...

View source: R/preprocessing_imputation.R

impute_basicR Documentation

Imputes the dataset with median of column for numeric data and the class 'other' or most frequent observation for categorical features

Description

Fast and simple imputation method treating categorical and numerical features differently. Please, note that imputation is performed on all columns including the target 'y', as we assume that it has no missing values or these were handled by removal functions or the user.

Usage

impute_basic(
  data,
  na_indicators = c(""),
  categorical_imputation = "other",
  verbose = FALSE
)

Arguments

data

A data source, that is one of the major R formats: data.table, data.frame, matrix, and so on.

na_indicators

A list containing the values that will be treated as NA indicators. By default the list is c(”). WARNING Do not include NA or NaN, as these are already checked in other criterion.

categorical_imputation

A string value describing the imputation method for categorical features. The user can choose from setting missing values as 'other' or the most frequent value from feature. The respective options are: 'other' or 'frequency'. By default set to 'other'.

verbose

A logical value, if set to TRUE, provides all information about preprocessing process, if FALSE gives none.

Value

Imputed dataset.


ModelOriented/forester documentation built on June 6, 2024, 7:29 a.m.