preprocessing_removal: Helper function for the custom preprocessing removing columns...
In ModelOriented/forester: Quick and Simple Tools for Training and Testing of Tree-Based Models

preprocessing_removal

R Documentation

Helper function for the custom preprocessing removing columns and rows.

Description

This function includes 6 modules for the removal of unwanted features / observations. We can remove duplicate columns, the ID-like columns, static columns (with specified staticity threshold), sparse columns (with specified sparsity threshold), and highly correlated ones (with specified high correlation threshold). Additionally we can remove the observations that are too sparse (sparsity threshold), and have missing target value. One can turn on and off each module by setting proper 'active_modules' logical values.

Usage

preprocessing_removal(
  data,
  y,
  active_modules = c(duplicate_cols = TRUE, id_like_cols = TRUE, static_cols = TRUE,
    sparse_cols = TRUE, corrupt_rows = TRUE, correlated_cols = TRUE),
  id_names = c("id", "nr", "number", "idx", "identification", "index"),
  static_threshold = 0.99,
  sparse_columns_threshold = 0.3,
  sparse_rows_threshold = 0.3,
  na_indicators = c(""),
  high_correlation_threshold = 0.7,
  verbose = FALSE
)

Arguments

`data`	A data source, that is one of the major R formats: data.table, data.frame, matrix, and so on.
`y`	A string that indicates a target column name.
`active_modules`	A logical vector describing active removal modules. By default it is set as 'c(duplicate_cols = TRUE, id_like_cols = TRUE, static_cols = TRUE, sparse_cols = TRUE, corrupt_rows = TRUE, correlated_cols = TRUE)', which is equal to c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE). Setting corrupt_rows to FALSE still results in the removal of observations without target value.
`id_names`	A vector of strings indicating which column names are perceived as ID-like. By default the list is: ['id', 'nr', 'number', 'idx', 'identification', 'index'].
`static_threshold`	A numeric value from [0,1] range, which indicates the maximum threshold of dominating values for column If feature has more dominating values it is going to be removed. By default set to 1, which indicates that all values are equal.
`sparse_columns_threshold`	A numeric value from [0,1] range, which indicates the maximum threshold of missing values for columns If column has more missing fields it is going to be removed. By default set to 0.3.
`sparse_rows_threshold`	A numeric value from [0,1] range, which indicates the maximum threshold of missing values for observation. If observation has more missing fields it is going to be removed. By default set to 0.3.
`na_indicators`	A list containing the values that will be treated as NA indicators. By default the list is c(”). WARNING Do not include NA or NaN, as these are already checked in other criterion.
`high_correlation_threshold`	A numeric value from [0,1] range, which indicates when we consider the correlation to be high. If feature surpasses this threshold it is going to be removed. By default set to 0.7.
`verbose`	A logical value, if set to TRUE, provides all information about preprocessing process, if FALSE gives none.

Value

A list containing three objects:

`data` A dataset with deleted observations and columns.
`rm_col` The indexes of removed columns.
`rm_row` The indexes of removed rows.

ModelOriented/forester documentation built on June 6, 2024, 7:29 a.m.

ModelOriented/forester index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ModelOriented/forester
Quick and Simple Tools for Training and Testing of Tree-Based Models

preprocessing_removal: Helper function for the custom preprocessing removing columns...
In ModelOriented/forester: Quick and Simple Tools for Training and Testing of Tree-Based Models

Helper function for the custom preprocessing removing columns and rows.

Description

Usage

Arguments

Value

Related to preprocessing_removal in ModelOriented/forester...

R Package Documentation

Browse R Packages

We want your feedback!

ModelOriented/forester Quick and Simple Tools for Training and Testing of Tree-Based Models

preprocessing_removal: Helper function for the custom preprocessing removing columns... In ModelOriented/forester: Quick and Simple Tools for Training and Testing of Tree-Based Models

Helper function for the custom preprocessing removing columns and rows.

Description

Usage

Arguments

Value

Related to preprocessing_removal in ModelOriented/forester...

R Package Documentation

Browse R Packages

We want your feedback!

ModelOriented/forester
Quick and Simple Tools for Training and Testing of Tree-Based Models

preprocessing_removal: Helper function for the custom preprocessing removing columns...
In ModelOriented/forester: Quick and Simple Tools for Training and Testing of Tree-Based Models