Home

/

CRAN

/

solitude

/

isolationForest: Fit an Isolation Forest

isolationForest: Fit an Isolation Forest
In solitude: An Implementation of Isolation Forest

Description Design Details Methods Examples

'solitude' class implements the isolation forest method introduced by paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>). The extremely randomized trees (extratrees) required to build the isolation forest is grown using ranger function from ranger package.

$new() initiates a new 'solitude' object. The possible arguments are:

sample_size: (positive integer, default = 256) Number of observations in the dataset to used to build a tree in the forest
num_trees: (positive integer, default = 100) Number of trees to be built in the forest
replace: (boolean, default = FALSE) Whether the sample of observations should be chosen with replacement when sample_size is less than the number of observations in the dataset
seed: (positive integer, default = 101) Random seed for the forest
nproc: (NULL or a positive integer, default: NULL, means use all resources) Number of parallel threads to be used by ranger
respect_unordered_factors: (string, default: "partition")See respect.unordered.factors argument in ranger
max_depth: (positive number, default: ceiling(log2(sample_size))) See max.depth argument in ranger

$fit() fits a isolation forest for the given dataframe or sparse matrix, computes depths of terminal nodes of each tree and stores the anomaly scores and average depth values in $scores object as a data.table

$predict() returns anomaly scores for a new data as a data.table

Parallelization: ranger is parallelized and by default uses all the resources. This is supported when nproc is set to NULL. The process of obtaining depths of terminal nodes (which is excuted with $fit() is called) may be parallelized separately by setting up a future backend.

Method `new()`

Usage

isolationForest$new(
  sample_size = 256,
  num_trees = 100,
  replace = FALSE,
  seed = 101,
  nproc = NULL,
  respect_unordered_factors = NULL,
  max_depth = ceiling(log2(sample_size))
)

Method `fit()`

Usage

isolationForest$fit(dataset)

Method `predict()`

Usage

isolationForest$predict(data)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

isolationForest$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

## Not run: 
library("solitude")
library("tidyverse")
library("mlbench")

data(PimaIndiansDiabetes)
PimaIndiansDiabetes = as_tibble(PimaIndiansDiabetes)
PimaIndiansDiabetes

splitter   = PimaIndiansDiabetes %>%
  select(-diabetes) %>%
  rsample::initial_split(prop = 0.5)
pima_train = rsample::training(splitter)
pima_test  = rsample::testing(splitter)

iso = isolationForest$new()
iso$fit(pima_train)

scores_train = pima_train %>%
  iso$predict() %>%
  arrange(desc(anomaly_score))

scores_train

umap_train = pima_train %>%
  scale() %>%
  uwot::umap() %>%
  setNames(c("V1", "V2")) %>%
  as_tibble() %>%
  rowid_to_column() %>%
  left_join(scores_train, by = c("rowid" = "id"))

umap_train

umap_train %>%
  ggplot(aes(V1, V2)) +
  geom_point(aes(size = anomaly_score))

scores_test = pima_test %>%
  iso$predict() %>%
  arrange(desc(anomaly_score))

scores_test

## End(Not run)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.2     ✔ purrr   0.3.4
✔ tibble  3.0.4     ✔ dplyr   1.0.2
✔ tidyr   1.1.2     ✔ stringr 1.4.0
✔ readr   1.4.0     ✔ forcats 0.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
# A tibble: 768 x 9
   pregnant glucose pressure triceps insulin  mass pedigree   age diabetes
      <dbl>   <dbl>    <dbl>   <dbl>   <dbl> <dbl>    <dbl> <dbl> <fct>   
 1        6     148       72      35       0  33.6    0.627    50 pos     
 2        1      85       66      29       0  26.6    0.351    31 neg     
 3        8     183       64       0       0  23.3    0.672    32 pos     
 4        1      89       66      23      94  28.1    0.167    21 neg     
 5        0     137       40      35     168  43.1    2.29     33 pos     
 6        5     116       74       0       0  25.6    0.201    30 neg     
 7        3      78       50      32      88  31      0.248    26 pos     
 8       10     115        0       0       0  35.3    0.134    29 neg     
 9        2     197       70      45     543  30.5    0.158    53 pos     
10        8     125       96       0       0   0      0.232    54 pos     
# … with 758 more rows
INFO  [00:54:28.436] Building Isolation Forest ...  
INFO  [00:54:30.004] done 
INFO  [00:54:30.018] Computing depth of terminal nodes ...  
INFO  [00:54:30.798] done 
INFO  [00:54:30.836] Completed growing isolation forest 
      id average_depth anomaly_score
  1: 229          4.93     0.7163710
  2: 296          5.26     0.7005536
  3:  96          5.67     0.6813873
  4: 181          5.69     0.6804659
  5: 196          6.35     0.6507483
 ---                                
380: 349          8.00     0.5820092
381: 360          8.00     0.5820092
382: 361          8.00     0.5820092
383: 362          8.00     0.5820092
384: 383          8.00     0.5820092
Warning message:
The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
Using compatibility `.name_repair`.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
# A tibble: 384 x 5
   rowid     V1     V2 average_depth anomaly_score
   <int>  <dbl>  <dbl>         <dbl>         <dbl>
 1     1 -1.78  -1.72           7.96         0.584
 2     2 -1.15  -1.41           7.98         0.583
 3     3 -1.85   0.753          6.71         0.635
 4     4 -0.985 -5.18           7.53         0.601
 5     5  3.13   0.564          7.84         0.588
 6     6 -0.934 -5.14           7.62         0.597
 7     7 -2.62  -0.288          7.56         0.600
 8     8 -2.09   0.176          8            0.582
 9     9 -2.09   1.06           7.9          0.586
10    10  3.72   0.876          7.92         0.585
# … with 374 more rows
      id average_depth anomaly_score
  1:  34          5.70     0.6800056
  2: 166          5.86     0.6726840
  3: 252          5.94     0.6690528
  4:  83          6.51     0.6437417
  5: 109          6.52     0.6433063
 ---                                
380: 271          8.00     0.5820092
381: 273          8.00     0.5820092
382: 322          8.00     0.5820092
383: 323          8.00     0.5820092
384: 349          8.00     0.5820092

solitude documentation built on July 30, 2021, 1:07 a.m.

solitude index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

solitude
An Implementation of Isolation Forest

isolationForest: Fit an Isolation Forest
In solitude: An Implementation of Isolation Forest

Description

Design

Details

Methods

Public methods

Method `new()`

Usage

Method `fit()`

Usage

Method `predict()`

Usage

Method `clone()`

Usage

Arguments

Examples

Example output

Related to isolationForest in solitude...

R Package Documentation

Browse R Packages

We want your feedback!

solitude An Implementation of Isolation Forest

isolationForest: Fit an Isolation Forest In solitude: An Implementation of Isolation Forest

Description

Design

Details

Methods

Public methods

Method new()

Usage

Method fit()

Usage

Method predict()

Usage

Method clone()

Usage

Arguments

Examples

Example output

Related to isolationForest in solitude...

R Package Documentation

Browse R Packages

We want your feedback!

solitude
An Implementation of Isolation Forest

isolationForest: Fit an Isolation Forest
In solitude: An Implementation of Isolation Forest

Method `new()`

Method `fit()`

Method `predict()`

Method `clone()`