find_HDoutliers: Detect Anomalies in High Dimensional Data.

Description Usage Arguments Details Value References See Also Examples

View source: R/find_HDoutliers.R

Description

Detect anomalies in high dimensional data. This is a modification of HDoutliers.

Usage

1
2
find_HDoutliers(data, maxrows = 1000, alpha = 0.01,
  method = c("HDadv", "hdr", "ahull"))

Arguments

data

A vector, matrix, or data frame consisting of numeric and/or categorical variables.

maxrows

If the number of observations is greater than maxrows, outliers reduces the number used in k-nearest-neighbor computations to a set of exemplars. The default value is 10000.

alpha

Threshold for determining the cutoff for outliers. Observations are considered outliers if they fall in the (1- alpha) tail of the distribution of the nearest-neighbor distances between exemplars.

method

Outlier detection method used for detecting outlier in the high dimensional space.

Details

If the number of observations exceeds maxrows, the data is first partitioned into lists associated with exemplars and their members within radius of each exemplar, to reduce the number of k-nearest neighbor computations required for outlier detection.

Value

The indexes of the observations determined to be outliers.

References

Wilkinson, L. (2018), 'Visualizing big data outliers through distributed aggregation', IEEE transactions on visualization and computer graphics 24(1), 256-266.

See Also

get_leader_clusters

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
require(ggplot2)
set.seed(1234)
data <- c(rnorm(1000, mean = -6), 0, rnorm(1000, mean = 6))
outliers <- find_HDoutliers(data)
display_HDoutliers(data,outliers )


set.seed(1234)
n <- 1000 # number of observations
nout <- 10 # number of outliers
typical_data <- tibble::as.tibble(matrix(rnorm(2*n), ncol = 2, byrow = TRUE))
out <- tibble::as.tibble(matrix(5*runif(2*nout,min=-5,max=5), ncol = 2, byrow = TRUE))
data <- rbind(out, typical_data )
outliers <- find_HDoutliers(data)
display_HDoutliers(data, outliers)

pridiltal/stray documentation built on Dec. 12, 2018, 10:49 p.m.