GPNode: R6 Class for the nodes / leaves in the GPTree tree
In GPTreeO: Dividing Local Gaussian Processes for Online Learning Regression

GPNode

R Documentation

R6 Class for the nodes / leaves in the GPTree tree

Description

The nodes contain the local GP if they are leaves (at the end of a branch). Nodes that are just nodes contain information on how the input space was split. They are responsible for computing and updating the splitting probabilities. Also, the tree interacts with the local GPs through the nodes.

Currently, GPs from the DiceKriging package (WrappedDiceKrigingGP) and mlegp package (WrappedmlegpGP) are implemented. The user can create their own wrapper using WrappedGP.

Public fields

key

A string like "0110100" to identify the node in the binary tree

x_dim

Dimensionality of input points. It is set once the first point is received through the GPTree method update. It needs to be specified if min_ranges should be different from default.

theta

Overlap ratio between two leafs in the split direction. The default value is 0.

split_direction_criterion

A string that indicates which spitting criterion to use. The options are:

"max_spread": Split along the direction which has the largest data spread.
"min_lengthscale": split along the direction with the smallest length-scale hyperparameter from the local GP.
"max_spread_per_lengthscale": Split along the direction with the largest data spread relative to the corresponding GP length-scale hyperparameter.
"max_corr": Split along the direction where the input data is most strongly correlated with the target variable.
"principal_component": Split along the first principal component.

The default value is "max_spread_per_lengthscale".

split_position_criterion

A string indicating how the split position along the split direction should be set. Possible values are ("mean" and "median"). The default is "mean".

shape_decay

A string specifying how the probability function for a point to be assigned to the left leaf should fall off in the overlap region. The available options are a linear shape ("linear"), an exponential shape ("exponential") or a Gaussian shape ("gaussian"). Another option is to select no overlap region. This can be achieved by selecting "deterministic" or to set theta to 0. The default is "linear".

prob_min_theta

Minimum probability after which the overlap shape gets truncated (either towards 0 or 1). The default value is 0.01.

Nbar

Maximum number of data points for each GP in a leaf before it is split. The default value is 1000.

min_ranges

Smallest allowed input data spread (per dimension) before node splitting stops. It is set to its default min_ranges = rep(0.0, x_dim) once the first point is received through the update method. x_dim needs to be specified by the user if it should be different from the default.

is_leaf

If TRUE, this node a leaf, i.e the last node on its branch

wrapped_gp

An instance of the WrappedGP type

can_split

If TRUE for a given dimension, the leaf can be split along that dimension

rotation_matrix

A rotation matrix, used for transforming the data

shift

A shift, used for transforming the data

use_pc_transform

TRUE if principal components transformation is used for node splitting

x_spread

Vector of data spread for each dimension

split_index

Index for the split dimension

position_split

Position of the split along dimension split_index

width_overlap

Width of overlap region along dimension split_index

point_ids

IDs of the points assigned to this node

residuals

Vector of residuals

pred_errs

Vector of prediction uncertainties

error_scaler

Scaling factor for the prediction error to ensure desired coverage

use_n_residuals

Number of past residuals to use in calibrating the error_scaler

Methods

Method `new()`

Create a new node object

Usage

GPNode$new(
  key,
  x_dim,
  theta,
  split_direction_criterion,
  split_position_criterion,
  shape_decay,
  prob_min_theta,
  Nbar,
  wrapper,
  gp_control,
  retrain_buffer_length,
  add_buffer_in_prediction,
  min_ranges = NULL,
  is_leaf = TRUE
)

Arguments

key

A string like "0110100" to identify the node in the binary tree

x_dim

Dimensionality of input points. It is set once the first point is received through the GPTree method update. It needs to be specified if min_ranges should be different from default.

theta

Overlap ratio between two leafs in the split direction. The default value is 0.

split_direction_criterion

A string that indicates which spitting criterion to use. The options are:

"max_spread": Split along the direction which has the largest data spread.
"min_lengthscale": split along the direction with the smallest length-scale hyperparameter from the local GP.
"max_spread_per_lengthscale": Split along the direction with the largest data spread relative to the corresponding GP length-scale hyperparameter.
"max_corr": Split along the direction where the input data is most strongly correlated with the target variable.
"principal_component": Split along the first principal component.

The default value is "max_spread_per_lengthscale".

split_position_criterion

A string indicating how the split position along the split direction should be set. Possible values are ("mean" and "median"). The default is "mean".

shape_decay

prob_min_theta

Minimum probability after which the overlap shape gets truncated (either towards 0 or 1). The default value is 0.01.

Nbar

Maximum number of data points for each GP in a leaf before it is split. The default value is 1000.

wrapper

A string that indicates which GP implementation should be used. The current version includes wrappers for the packages "DiceKriging" and "mlegp". The default setting is "DiceKriging".

gp_control

A list of control parameter that is forwarded to the wrapper. Here, the covariance function is specified. DiceKriging allows for the following kernels, passed as string: "gauss", "matern5_2", "matern3_2", "exp", "powexp" where "matern3_2" is set as default.

retrain_buffer_length

Size of the retrain buffer. The buffer for a each node collects data points and holds them until the buffer length is reached. Then the GP in the node is updated with the data in the buffer. For a fixed Nbar, higher values for retrain_buffer_length lead to faster run time (less frequent retraining), but the trade-off is a temporary reduced prediction accuracy. We advise that the choice for retrain_buffer_length should depend on the chosen Nbar. By default retrain_buffer_length is set equal to Nbar.

add_buffer_in_prediction

If TRUE, points in the data buffers are added to the GP before prediction. They are added into a temporarily created GP which contains the not yet included points. The GP in the node is not yet updated. The default is FALSE.

min_ranges

Smallest allowed input data spread (per dimension) before node splitting stops. It is set to its default min_ranges = rep(0.0, x_dim) once the first point is received through the GPTree method update. x_dim needs to be specified by the user if it should be different from the default.

is_leaf

If TRUE, this node a leaf, i.e the last node on its branch.

n_points_train_limit

Number of points at which a GP is created in the leaf

Returns

A new GPNode object. Contains the local GP in the field wrapped_gp, and information used for and related to splitting the node. If the node has been split, the local GP is removed.

Method `transform()`

Method to transform input data through a shift and a rotation. IS EXPECTED TO NOT BE CALLED BY THE USER

Usage

GPNode$transform(X)

Arguments

X: Matrix with x points

Returns

The transformed X matrix

Method `update_prob_pars()`

Method to update the probability parameters (x_spread, can_split, split_index, position_split, width_overlap). IS EXPECTED TO NOT BE CALLED BY THE USER

Usage

GPNode$update_prob_pars()

Method `get_prob_child_1()`

Method to compute the probability that a point x should go to child 1. IS EXPECTED TO NOT BE CALLED BY THE USER

Usage

GPNode$get_prob_child_1(x)

Arguments

x: Single data point for which probability is computed; has to be a vector with length equal to x_dim

Returns

The probability that a point x should go to child 1

Method `register_residual()`

Method to register prediction performance

Usage

GPNode$register_residual(x, y)

Arguments

x: Most recent single input data point from the data stream; has to be a vector with length equal to x_dim
y: Target variable which has to be a one-dimensional matrix or a vector; any further columns will be ignored

Method `update_empirical_error_pars()`

Method for updating the empirical error parameters

Usage

GPNode$update_empirical_error_pars()

Method `delete_gp()`

Method to delete the GP. IS EXPECTED TO NOT BE CALLED BY THE USER

Usage

GPNode$delete_gp()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

GPNode$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

GPTreeO
Dividing Local Gaussian Processes for Online Learning Regression

GPNode: R6 Class for the nodes / leaves in the GPTree tree
In GPTreeO: Dividing Local Gaussian Processes for Online Learning Regression

R6 Class for the nodes / leaves in the GPTree tree

Description

Public fields

Methods

Public methods

Method `new()`

Usage

Arguments

Returns

Method `transform()`

Usage

Arguments

Returns

Method `update_prob_pars()`

Usage

Method `get_prob_child_1()`

Usage

Arguments

Returns

Method `register_residual()`

Usage

Arguments

Method `update_empirical_error_pars()`

Usage

Method `delete_gp()`

Usage

Method `clone()`

Usage

Arguments

See Also

Related to GPNode in GPTreeO...

R Package Documentation

Browse R Packages

We want your feedback!

GPTreeO Dividing Local Gaussian Processes for Online Learning Regression

GPNode: R6 Class for the nodes / leaves in the GPTree tree In GPTreeO: Dividing Local Gaussian Processes for Online Learning Regression

R6 Class for the nodes / leaves in the GPTree tree

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method transform()

Usage

Arguments

Returns

Method update_prob_pars()

Usage

Method get_prob_child_1()

Usage

Arguments

Returns

Method register_residual()

Usage

Arguments

Method update_empirical_error_pars()

Usage

Method delete_gp()

Usage

Method clone()

Usage

Arguments

See Also

Related to GPNode in GPTreeO...

R Package Documentation

Browse R Packages

We want your feedback!

GPTreeO
Dividing Local Gaussian Processes for Online Learning Regression

GPNode: R6 Class for the nodes / leaves in the GPTree tree
In GPTreeO: Dividing Local Gaussian Processes for Online Learning Regression

Method `new()`

Method `transform()`

Method `update_prob_pars()`

Method `get_prob_child_1()`

Method `register_residual()`

Method `update_empirical_error_pars()`

Method `delete_gp()`

Method `clone()`