ezr.h2o_outliers: Anomaly Detection - Deep Learning & Isolation
In jmp1989/easyr: Helpful wrappers for common EDA, Data Manipulation, & Modeling

Description Usage Arguments Details Value

Find outlier datapoints in a dataset and return entire dataframe with the following columns if the models were run:

ezr.h2o_outliers(dataset, y = NULL, x = NULL, deep_learning = FALSE,
  isolation_forest = TRUE, hidden = c(10, 10), epochs = 100,
  return_as_data_frame = TRUE, ntrees = 75, max_depth = 9,
  max_runtime_min = 5, return_extras = TRUE)

`dataset`	Dataset
`y`	The target. You don't want to use this with the X variables used to identify anamolies.
`x`	Variables to consider for anamoly detection.
`deep_learning`	Default is FALSE. Both this and Isolation forest can be TRUE, to see side-by-side comparison/agreement.
`isolation_forest`	Default is TRUE. Use this algo to detect anamolies.
`hidden`	DL parameter
`epochs`	DL parameter
`ntrees`	Default is 75.
`max_depth`	Default is 9. May wish to increase if having trouble identifying anamolies - if the max seperation is close to this number.
`max_runtime_min`	Time in minutes for each type of model to run

For isolation forest: anomaly_iso_predict, anomaly_iso_length, pct_rank_iso_anomaly For DL model: anomaly_dl_recon_mse, pct_rank_dl_anomaly

Lower Ranks such as 0.01 indicate something that very likely is an outlier. Higher values for predict indicate that somethign is an outlier. anomaly_iso_length refers to mean number of splits from the isolation forest to classify as an outlier.

Original Dataframe, with outlier columns added: <outlier_dl> and <outlier_iso_forest>. Higher values are larger outliers

jmp1989/easyr documentation built on May 20, 2019, 7:25 a.m.