View source: R/extendedisolationforest.R
h2o.extendedIsolationForest | R Documentation |
Trains an Extended Isolation Forest model
h2o.extendedIsolationForest(
training_frame,
x,
model_id = NULL,
ignore_const_cols = TRUE,
categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
"Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
score_each_iteration = FALSE,
score_tree_interval = 0,
ntrees = 100,
sample_size = 256,
extension_level = 0,
seed = -1,
disable_training_metrics = TRUE
)
training_frame |
Id of the training data frame. |
x |
A vector containing the |
model_id |
Destination id for this model; auto-generated if not specified. |
ignore_const_cols |
|
categorical_encoding |
Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. |
score_each_iteration |
|
score_tree_interval |
Score the model after every so many trees. Disabled if set to 0. Defaults to 0. |
ntrees |
Number of Extended Isolation Forest trees. Defaults to 100. |
sample_size |
Number of randomly sampled observations used to train each Extended Isolation Forest tree. Defaults to 256. |
extension_level |
Maximum is N - 1 (N = numCols). Minimum is 0. Extended Isolation Forest with extension_Level = 0 behaves like Isolation Forest. Defaults to 0. |
seed |
Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number). |
disable_training_metrics |
|
## Not run:
library(h2o)
h2o.init()
# Import the prostate dataset
p <- h2o.importFile(path="https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")
# Set the predictors
predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON")
# Build an Extended Isolation forest model
model <- h2o.extendedIsolationForest(x = predictors,
training_frame = p,
model_id = "eif.hex",
ntrees = 100,
sample_size = 256,
extension_level = length(predictors) - 1)
# Calculate score
score <- h2o.predict(model, p)
anomaly_score <- score$anomaly_score
# Number in [0, 1] explicitly defined in Equation (1) from Extended Isolation Forest paper
# or in paragraph '2 Isolation and Isolation Trees' of Isolation Forest paper
anomaly_score <- score$anomaly_score
# Average path length of the point in Isolation Trees from root to the leaf
mean_length <- score$mean_length
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.