ezr.h2o_outliers: Anomaly Detection - Deep Learning & Isolation

Description Usage Arguments Details Value

Description

Find outlier datapoints in a dataset and return entire dataframe with the following columns if the models were run:

Usage

1
2
3
4
ezr.h2o_outliers(dataset, y = NULL, x = NULL, deep_learning = FALSE,
  isolation_forest = TRUE, hidden = c(10, 10), epochs = 100,
  return_as_data_frame = TRUE, ntrees = 75, max_depth = 9,
  max_runtime_min = 5, return_extras = TRUE)

Arguments

dataset

Dataset

y

The target. You don't want to use this with the X variables used to identify anamolies.

x

Variables to consider for anamoly detection.

deep_learning

Default is FALSE. Both this and Isolation forest can be TRUE, to see side-by-side comparison/agreement.

isolation_forest

Default is TRUE. Use this algo to detect anamolies.

hidden

DL parameter

epochs

DL parameter

ntrees

Default is 75.

max_depth

Default is 9. May wish to increase if having trouble identifying anamolies - if the max seperation is close to this number.

max_runtime_min

Time in minutes for each type of model to run

Details

For isolation forest: anomaly_iso_predict, anomaly_iso_length, pct_rank_iso_anomaly For DL model: anomaly_dl_recon_mse, pct_rank_dl_anomaly

Lower Ranks such as 0.01 indicate something that very likely is an outlier. Higher values for predict indicate that somethign is an outlier. anomaly_iso_length refers to mean number of splits from the isolation forest to classify as an outlier.

Value

Original Dataframe, with outlier columns added: <outlier_dl> and <outlier_iso_forest>. Higher values are larger outliers


jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.