SAFE: Scalable Automatic Feature Engineering

Description Usage Arguments Value

View source: R/SAFE.R

Description

Generate automatically new features based on older ones for further modelling, using SAFE algoritm proposed in a paper by Shi, Zhang, Li, Yang and Zhou. This is a direct implementation of the pseudo-algoritm proposed in the paper, with its conventions, denotements and flaws.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SAFE(
  X_train,
  y_train,
  X_valid,
  y_valid,
  operators = list(NULL, list(`+`, `-`, `*`)),
  n_iter = 10,
  nrounds = 5,
  alpha = 0.1,
  gamma = 10,
  bins = 30,
  theta = 0.8,
  beta = Inf
)

Arguments

X_train

Matrix - data used to train model. Must be numerical.

y_train

Factor - labels for training data. Must be binary.

X_valid

Matrix - data used to test model. Must be numerical.

y_valid

Factor - labels for testing data. Must be binary.

operators

A list of lists of functions. Ith list of funcitons contains functions accepting i vectors of equal length and returning 1 vector of the same length.

n_iter

Integer; Amount of iterations for the alghoritm to perform.

nrounds

Integer for xgb.train.

alpha

Threshold for link{IV}. Features with IV < alpha will be dropped.

gamma

Integer; Amount of most important feature combinations to be selected in each iteration.

bins

Integer; amount of bins to create to discretize features.

theta

Threshold for Pearson's correlation. Features with correlation above theta will be dropped.

beta

Integer; Maximum amount of features to be selected at the end of each loop. Set to Inf to select all features.

Value

A list with 2 elements: X_train and X_test. Both contain transformed train and test sets, ready for further modelling. Unfortunately, this is in contrary to algoritm mentioned in the paper (which returns a function) - at least for now.


MrDomani/autofeat documentation built on June 11, 2020, 4:45 a.m.