deepforest: Deep Forest

Description Usage Arguments Value

View source: R/deepforest.R

Description

A combination of stacking (per Kaggle) and the deep forest idea proposed in https://arxiv.org/pdf/1702.08835.pdf with a few additional knobs and levers of my own..

Usage

1
2
3
4
5
6
deepforest(x, y, x_val = NULL, y_val = NULL, nfold = 5, index = NULL,
  objective = NULL, eval_func = NULL, nlayer = 5, nmeta = 4,
  nmetarep = 1, nclass = NULL, metatype = rep(c("bag", "boost", "dart",
  "lin"), length = nmeta * nmetarep), metaparam = NULL, metarandom = FALSE,
  colsample_bylayer = 1, colsample_add = FALSE, accumulate = FALSE,
  nthread = 2, missing = NA, printby = "fold", ...)

Arguments

x

Object coercible to a matrix.

y

A vector of labels for x.

x_val

Object coercible as matrix used for validation. Defaults to x if not specified.

y_val

A vector of labels for x_val.

nfold

The number of folds to stack on. If index is specified, this argument is ignored.

index

A list of integers indicating observations belonging to each fold. Random sampling is used if NULL to create nfold folds.

objective

The objective passed to xgb.train. It is inferred by nclass if not specified.

eval_func

A function used to evaluate each meta model.

nlayer

The number of layers to stack.

nmeta

The number of meta models to build on each layer.

nmetarep

The number of repeated meta models to construct.

nclass

An integer indicating the number of classes. 1 for regression, 2 for binary classification and >2 for multiclass. If NULL, then it is set to length(unique(y)) if the value is less than 10 and 1 otherwise.

metatype

A list of character variables indicating type of meta models to construct. Currently supports "bag", "boost", "dart" and "lin" for a random forest, boosted trees, boosted trees with dropout and boosted linear models respectively, all based on xgboost. If length of metatype is not equal to nmeta * nmetarep then it is recycled to that length.

metaparam

A list indicating additional parameters for each meta model. The mechanism used to generate default values and create randomized values are in metaparam. If list is shorter than nmeta * nmetarep then all meta models without a clear metaparam will take on default values, so you should indicate manually tuned models first in metatype.

metarandom

A logical indicating whether to create randomized miscellaneous parameters.

colsample_bylayer

A numeric vector between [0,1] indicating the fraction of features to sample at each layer. If a single value, then you may use colsample_add to further modify how the value changes, otherwise a constant value will be used for each layer.

colsample_add

Either a logical or a numeric. If TRUE then values will be added at each layer to ensure that the final nlayer has a sampling rate of 1 (i.e. all features used). If FALSE then no modifier is applied to colsample_bylayer. If a numeric value, that constant value is added to colsample_bylayer after each layer.

accumulate

A logical indicating if the predictions by meta models from each layer is accumulated. If FALSE then each layer will not have the direct predictions from two layers before although it will be implicitly approximated by the predictions from the layer immediately before.

nthread

The nthread argument passed to xgb.train.

missing

The missing argument passed to xgb.train.

...

Additional arguments passed to xgb.train.

Value

A deepforest object containing all the models constructed and sampling results for used in prediction on test data.


michaelzxu/deepforest documentation built on May 5, 2019, 5:56 p.m.