dif: Deep Isolation Forest

View source: R/dif.R

difR Documentation

Deep Isolation Forest

Description

The function builds a deep isolation forest that uses fuzzy logic to determine if a record is anomalous on not. The function takes a wide-format data.frame object as input and returns it with two appended vectors. The first vector contains the anomaly scores as numbers between zero and one, and the second vector provides a set of logical values indicating whether the records are outliers (TRUE) or not (FALSE).

Usage

dif(dta, nt = 100L, nss = NULL, threshold = 0.95)

Arguments

dta

A wide-format data.frame object with records (stored by row).

nt

Number of deep isolation trees to build to form the forest. By default, it is set to 100.

nss

Number of subsamples used to build a single deep isolation tree. If set (by default) to NULL, the program will randomly select 25% of the records provided to the dta argument.

threshold

A number between zero and one used as a threshold when identifying outliers from the anomaly scores. By default, this argument is set to 0.95, so that 5% of the records is going to be classified as anomalous.

Details

The argument dta is proivded as an object of class data.frame. This object is considered as a wide-format data.frame. The use of the R-packages dplyr, purrr, and tidyr is highly recommended to simplify the conversion of datasets between long and wide formats.

Value

The wide-format data.frame is provided as input data and contains extra columns, i.e., for both anomaly scores and the outlier flags.

Author(s)

Luca Scellwise artore drwolf85@gmail.com

Examples

# Load the package
library(HRTnomaly)
set.seed(2025L)
# Detect outliers in the `iris` dataset
res <- dif(iris)

HRTnomaly documentation built on April 3, 2025, 6:17 p.m.