SAFE: Scalable Automatic Feature Engineering
In MrDomani/autofeat: Scalable Automatic Feature Engineering

Description Usage Arguments Value

View source: R/SAFE.R

Generate automatically new features based on older ones for further modelling, using SAFE algoritm proposed in a paper by Shi, Zhang, Li, Yang and Zhou. This is a direct implementation of the pseudo-algoritm proposed in the paper, with its conventions, denotements and flaws.

SAFE(
  X_train,
  y_train,
  X_valid,
  y_valid,
  operators = list(NULL, list(`+`, `-`, `*`)),
  n_iter = 10,
  nrounds = 5,
  alpha = 0.1,
  gamma = 10,
  bins = 30,
  theta = 0.8,
  beta = Inf
)

`X_train`	Matrix - data used to train model. Must be numerical.
`y_train`	Factor - labels for training data. Must be binary.
`X_valid`	Matrix - data used to test model. Must be numerical.
`y_valid`	Factor - labels for testing data. Must be binary.
`operators`	A `list` of lists of functions. Ith list of funcitons contains functions accepting `i` vectors of equal length and returning 1 vector of the same length.
`n_iter`	Integer; Amount of iterations for the alghoritm to perform.
`nrounds`	Integer for `xgb.train`.
`alpha`	Threshold for `link{IV}`. Features with IV < alpha will be dropped.
`gamma`	Integer; Amount of most important feature combinations to be selected in each iteration.
`bins`	Integer; amount of bins to create to discretize features.
`theta`	Threshold for Pearson's correlation. Features with correlation above theta will be dropped.
`beta`	Integer; Maximum amount of features to be selected at the end of each loop. Set to `Inf` to select all features.

A list with 2 elements: X_train and X_test. Both contain transformed train and test sets, ready for further modelling. Unfortunately, this is in contrary to algoritm mentioned in the paper (which returns a function) - at least for now.

MrDomani/autofeat documentation built on June 11, 2020, 4:45 a.m.