moutlier_iforest: Isolation Forest multivariate outlier detection

Description Usage Arguments Details Examples

View source: R/outliers-multivar.r

Description

Performs outlier detection using an Isolation Forest.

Usage

1
2
3
4
5
6
7
moutlier_iforest(
  xs,
  mask = !Reduce("|", lapply(xs, is.na)),
  threshold = c(0.8, 0.9),
  return.score = FALSE,
  ...
)

Arguments

xs

A dataframe or list of vectors (which will be coerced to a numeric matrix).

mask

A logical vector that defines which values in x will used when computing statistics. Useful when a subset of quality-assured data is available. Default mask is non-NA Values.

threshold

A length-two vector identifying thresholds for "mild" and "extreme" outliers.

return.score

if TRUE, return the numeric outlier score. If FALSE, return an ordered factor classifying the observations as one of "not outlier" (1), "mild outlier" (2), or "extreme outlier" (3).

...

Additional arguments to solitude::isolationForest$new(). note that the argument sample_size will be overwritten to use the number of unmasked data points, i.e. length(which(mask)).

Details

the values of threshold identify mild and extreme\ outliers based on the Isolation Forest score in the range [0,1]. Default values are 0.8 for "mild" outliers and 0.9 for "extreme" outliers.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
x = seq(0, 34, by = 0.25)*pi
noise = rlnorm(length(x), meanlog = 1, sdlog = 3)
y = sin(x) + noise
mask = noise < 1

if (requireNamespace("solitude", quietly = TRUE)) {
  moutlier_iforest(list(y))
  moutlier_iforest(list(x, y))
  moutlier_iforest(list(x, y), mask)
  moutlier_iforest(list(x, y), mask, threshold = c(1, 2))
  moutlier_iforest(list(x, y), return.score = TRUE)
}

mkoohafkan/wqptools documentation built on May 2, 2021, 8:12 p.m.