isotree.add.tree: Add additional (single) tree to isolation forest model

View source: R/isoforest.R

isotree.add.treeR Documentation

Add additional (single) tree to isolation forest model

Description

Adds a single tree fit to the full (non-subsampled) data passed here. Must have the same columns as previously-fitted data. Categorical columns, if any, may have new categories.

Usage

isotree.add.tree(
  model,
  data,
  sample_weights = NULL,
  column_weights = NULL,
  refdata = NULL
)

Arguments

model

An Isolation Forest object as returned by isolation.forest, to which an additional tree will be added.

This object will be modified in-place.

data

A 'data.frame', 'data.table', 'tibble', 'matrix', or sparse matrix (from package 'Matrix' or 'SparseM', CSC format) to which to fit the new tree.

sample_weights

Sample observation weights for each row of 'X', with higher weights indicating distribution density (i.e. if the weight is two, it has the same effect of including the same data point twice). If not 'NULL', model must have been built with 'weights_as_sample_prob' = 'FALSE'.

column_weights

Sampling weights for each column in 'data'. Ignored when picking columns by deterministic criterion. If passing 'NULL', each column will have a uniform weight. If used along with kurtosis weights, the effect is multiplicative.

refdata

Reference points for distance and/or kernel calculations, if these were previously added to the model object through isotree.set.reference.points. Must correspond to the same points that were passed in the call to that function. If sparse, only CSC format is supported.

This is ignored if the model has no stored reference points.

Details

If constructing trees with different sample sizes, the outlier scores with depth-based metrics will not be centered around 0.5 and might have a very skewed distribution. The standardizing constant for the scores will be taken according to the sample size passed in the model construction argument.

If trees are going to be fit to samples of different sizes, it's strongly recommended to use density-based scoring metrics instead.

Be aware that, if an out-of-memory error occurs, the resulting object might be rendered unusable (might crash when calling certain functions).

For safety purposes, the model object can be deep copied (including the underlying C++ object) through function isotree.deep.copy before undergoing an in-place modification like this.

If this function is going to be called frequently, it's highly recommended to use 'lazy_serialization=TRUE' as then it will not need to copy over serialized bytes.

Value

The same 'model' object now modified, as invisible.

See Also

isolation.forest isotree.restore.handle


isotree documentation built on May 29, 2024, 11:24 a.m.