h2o.extendedIsolationForest: Trains an Extended Isolation Forest model
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

View source: R/extendedisolationforest.R

h2o.extendedIsolationForest

R Documentation

Trains an Extended Isolation Forest model

Description

Trains an Extended Isolation Forest model

Usage

h2o.extendedIsolationForest(
  training_frame,
  x,
  model_id = NULL,
  ignore_const_cols = TRUE,
  categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
    "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
  score_each_iteration = FALSE,
  score_tree_interval = 0,
  ntrees = 100,
  sample_size = 256,
  extension_level = 0,
  seed = -1,
  disable_training_metrics = TRUE
)

Arguments

`training_frame`	Id of the training data frame.
`x`	A vector containing the `character` names of the predictors in the model.
`model_id`	Destination id for this model; auto-generated if not specified.
`ignore_const_cols`	`Logical`. Ignore constant columns. Defaults to TRUE.
`categorical_encoding`	Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.
`score_each_iteration`	`Logical`. Whether to score during each iteration of model training. Defaults to FALSE.
`score_tree_interval`	Score the model after every so many trees. Disabled if set to 0. Defaults to 0.
`ntrees`	Number of Extended Isolation Forest trees. Defaults to 100.
`sample_size`	Number of randomly sampled observations used to train each Extended Isolation Forest tree. Defaults to 256.
`extension_level`	Maximum is N - 1 (N = numCols). Minimum is 0. Extended Isolation Forest with extension_Level = 0 behaves like Isolation Forest. Defaults to 0.
`seed`	Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).
`disable_training_metrics`	`Logical`. Disable calculating training metrics (expensive on large datasets) Defaults to TRUE.

Examples

## Not run: 
library(h2o)
h2o.init()

# Import the prostate dataset
p <- h2o.importFile(path="https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")

# Set the predictors
predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON")

# Build an Extended Isolation forest model
model <- h2o.extendedIsolationForest(x = predictors,
                                     training_frame = p,
                                     model_id = "eif.hex",
                                     ntrees = 100,
                                     sample_size = 256,
                                     extension_level = length(predictors) - 1)

# Calculate score
score <- h2o.predict(model, p)
anomaly_score <- score$anomaly_score

# Number in [0, 1] explicitly defined in Equation (1) from Extended Isolation Forest paper
# or in paragraph '2 Isolation and Isolation Trees' of Isolation Forest paper
anomaly_score <- score$anomaly_score

# Average path length of the point in Isolation Trees from root to the leaf
mean_length <- score$mean_length

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.