pipe_range_classifier: Generates features for regression problems through...

Description Usage Arguments Details Value

View source: R/range_classifier.R

Description

Use classification models to classify if the response is larger than a series of given values for regression tasks

Usage

1
2
3
4
5
pipe_range_classifier(train, response, exclude_columns = response,
  base_temporary_column_name = "base_temporary_column_name",
  base_definitive_column_name = paste0(response, "_quantile"),
  quantiles = 0, even_spreads = 0, values, model = c("glm",
  "xgboost")[1], controls)

Arguments

train

The train dataset, as a data.frame or data.table. Data.tables may be changed by reference.

response

String denoting the name of the column that should be used as the response variable.

exclude_columns

Columns that shouldn't be used in the models. Defaults to the response column and will ALWAYS include the response column.

base_temporary_column_name

Base name that will be used to create a temporary variable for training the classifier. Use this to ensure no existing columns are overwritten.

base_definitive_column_name

Base name that will be used to store the predictions of the created classifiers. Will be appended by the threshold value. Use this to ensure no existing columns are overwritten.

quantiles

Number of quantiles to use to generate threshold values. Will actually generate quantiles+2 quantiles and look at 2nd to quantiles+1-th quantiles to remove non-sensical thresholds. Non-negative integer, defaults to 0.

even_spreads

Number of evenly spread thresholds to use. These will be based on the minimum and maximum value of the response in train. Defines its thresholds simarly to quantiles Non-negative integer, defaults to 0.

values

Threshold values to use. We will check if these fall in the range of the response in train.

model

Type of model to use. Currently only binomial glm and xgboost are available.

controls

Parameters for the models to use. Leave empty or set to NA to use defaults:

  • glm: glm.control

  • xgboost: see xgb.train

Details

If multiple values out of quantiles, even_spreads, or values are chosen, all options will be applied.

Value

A list containing the transformed train dataset and a trained pipe.


jeroenvdhoven/datapiper documentation built on July 14, 2019, 9:34 p.m.