train: Model training

View source: R/train.R

trainR Documentation

Model training

Description

Trains a decision forest on feature and target.

Usage

train(
  forest,
  graph,
  features,
  target,
  niter = 200,
  offset = 0,
  min.walk.depth = 2,
  ntrees = 100,
  initial.walk.depth = NaN,
  performance = NULL,
  flatten.sep = "$",
  importance = "impurity",
  splitrule = "gini"
)

Arguments

forest

a DFNET.forest or null.

graph

The graph to train the network on.

features

numeric matrix or 3D array. The features to train on.

target

numeric vector. The target to train towards.

niter

integer. The number of iterations to run.

offset

integer. An offset added to the iteration count for logging purposes.

min.walk.depth

The integer minimal number of nodes to visit per tree per iteration.

ntrees

integer. The number of trees to generate per iteration.

initial.walk.depth

integer. The number of nodes to visit per tree during initialization.

performance

unary function. Called with a decision tree as argument to estimate that tree's performance.

flatten.sep

string. Separator to use when flattening features.

importance

variable importance mode. See ranger:rangerranger::ranger.

splitrule

Splitting rule. See ranger:rangerranger::ranger.

Details

This function generates ntrees modules and decision trees per iteration and greedily selects those which improve the performance metric. The trees are trained on features and target. performance can use its own validation set, or default to the features and target above (the default), in which case ranger handles the data split.

In each iteration, this function tries to shrink modules which have previously been improved. initial.walk.depth thus gives the maximal module size, whereas min.walk.depth specifies the smallest walk depth.

Model training can be resumed from an already trained forest, in which case the attributes of that forest are used in lieu of ntrees and initial.walk.depth. When resuming this training, it might make sense to also specify the offset parameter for somewhat improved logging.

The returned DFNET.forest is a list of shape (trees, modules, modules.weights), where trees are the decision trees created for detected modules, and modules.weights gives the weights used for each node.

As "private" attributes used for iteration, generation_size is set to ntrees, walk.depth captures the walk depth for the next iteration, and last.performance to a vector of length ntrees, containing the result of performance of each tree w.r.t. target.

Examples

## Not run: 
forest <- NULL
offset <- 0
while (keep_iterating(forest, target)) { # insert your own iteration criteria
    forest <- train(
        forest,
        graph,
        features,
        niter = 10,
        offset = offset
        # ...
    )
    offset <- offset + 10
}

## End(Not run)


pievos101/DFNET documentation built on Dec. 1, 2022, 3:44 p.m.