View source: R/isolationforest.R
h2o.isolationForest | R Documentation |
Trains an Isolation Forest model
h2o.isolationForest(
training_frame,
x,
model_id = NULL,
score_each_iteration = FALSE,
score_tree_interval = 0,
ignore_const_cols = TRUE,
ntrees = 50,
max_depth = 8,
min_rows = 1,
max_runtime_secs = 0,
seed = -1,
build_tree_one_node = FALSE,
mtries = -1,
sample_size = 256,
sample_rate = -1,
col_sample_rate_change_per_level = 1,
col_sample_rate_per_tree = 1,
categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
"Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
stopping_rounds = 0,
stopping_metric = c("AUTO", "anomaly_score"),
stopping_tolerance = 0.01,
export_checkpoints_dir = NULL,
contamination = -1,
validation_frame = NULL,
validation_response_column = NULL
)
training_frame |
Id of the training data frame. |
x |
A vector containing the |
model_id |
Destination id for this model; auto-generated if not specified. |
score_each_iteration |
|
score_tree_interval |
Score the model after every so many trees. Disabled if set to 0. Defaults to 0. |
ignore_const_cols |
|
ntrees |
Number of trees. Defaults to 50. |
max_depth |
Maximum tree depth (0 for unlimited). Defaults to 8. |
min_rows |
Fewest allowed (weighted) observations in a leaf. Defaults to 1. |
max_runtime_secs |
Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. |
seed |
Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number). |
build_tree_one_node |
|
mtries |
Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3. Defaults to -1. |
sample_size |
Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored. Defaults to 256. |
sample_rate |
Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead. Defaults to -1. |
col_sample_rate_change_per_level |
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) Defaults to 1. |
col_sample_rate_per_tree |
Column sample rate per tree (from 0.0 to 1.0) Defaults to 1. |
categorical_encoding |
Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. |
stopping_rounds |
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. |
stopping_metric |
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Must be one of: "AUTO", "anomaly_score". Defaults to AUTO. |
stopping_tolerance |
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.01. |
export_checkpoints_dir |
Automatically export generated models to this directory. |
contamination |
Contamination ratio - the proportion of anomalies in the input dataset. If undefined (-1) the predict function will not mark observations as anomalies and only anomaly score will be returned. Defaults to -1 (undefined). Defaults to -1. |
validation_frame |
Id of the validation data frame. |
validation_response_column |
(experimental) Name of the response column in the validation frame. Response column should be binary and indicate not anomaly/anomaly. |
## Not run:
library(h2o)
h2o.init()
# Import the cars dataset
f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv"
cars <- h2o.importFile(f)
# Set the predictors
predictors <- c("displacement", "power", "weight", "acceleration", "year")
# Train the IF model
cars_if <- h2o.isolationForest(x = predictors, training_frame = cars,
seed = 1234, stopping_metric = "anomaly_score",
stopping_rounds = 3, stopping_tolerance = 0.1)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.