Description Usage Arguments Details Value
Find outlier datapoints in a dataset and return entire dataframe with the following columns if the models were run:
1 2 3 4 |
dataset |
Dataset |
y |
The target. You don't want to use this with the X variables used to identify anamolies. |
x |
Variables to consider for anamoly detection. |
deep_learning |
Default is FALSE. Both this and Isolation forest can be TRUE, to see side-by-side comparison/agreement. |
isolation_forest |
Default is TRUE. Use this algo to detect anamolies. |
hidden |
DL parameter |
epochs |
DL parameter |
ntrees |
Default is 75. |
max_depth |
Default is 9. May wish to increase if having trouble identifying anamolies - if the max seperation is close to this number. |
max_runtime_min |
Time in minutes for each type of model to run |
For isolation forest: anomaly_iso_predict, anomaly_iso_length, pct_rank_iso_anomaly For DL model: anomaly_dl_recon_mse, pct_rank_dl_anomaly
Lower Ranks such as 0.01 indicate something that very likely is an outlier. Higher values for predict indicate that somethign is an outlier. anomaly_iso_length refers to mean number of splits from the isolation forest to classify as an outlier.
Original Dataframe, with outlier columns added: <outlier_dl> and <outlier_iso_forest>. Higher values are larger outliers
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.