KNN_AGG: Aggregated k-nearest neighbors distance over different k's

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to calculate aggregated distance to k-nearest neighbors over a range of k's, as an outlier score. Suggested by Angiulli, F., & Pizzuti, C. (2002)

Usage

1
KNN_AGG(dataset, k_min = 5, k_max = 10)

Arguments

dataset

The dataset for which observations have an aggregated k-nearest neighbors distance returned

k_min

The k parameter starting the k-range

k_max

The k parameter ending the k-range. Has to be smaller than the number of observations in dataset and greater than or equal to k_min

Details

KNN_AGG computes the aggregated distance to neighboring observations by aggregating the results from k_min-NN to k_max-NN, such that if k_min=1 and k_max=3, results from 1NN, 2NN and 3NN are aggregated. A kd-tree is used for kNN computation, using the kNN function() from the 'dbscan' package. The KNN_AGG function is useful for outlier detection in clustering and other multidimensional domains.

Value

A vector of aggregated distance for observations. The greater the distance, the greater outlierness

Author(s)

Jacob H. Madsen

References

Angiulli, F., & Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. In Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD). Helsinki, Finland. pp. 15-26. DOI: 10.1007/3-540-45681-3_2

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Create dataset
X <- iris[,1:4]

# Find outliers by setting a range of k's
outlier_score <- KNN_AGG(dataset=X, k_min=10, k_max=15)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)

Example output

     119      118      132      107      123       99       42      110 
84.14264 81.98029 81.18188 67.68582 66.93006 65.87829 62.35515 61.82369 
      61      136       58      106       94       16      135      108 
61.44617 59.08369 58.91190 57.98081 55.76996 55.11513 52.39217 52.36970 
     130      109      115      131       69      101       15      120 
51.90887 51.18809 50.56004 50.37625 49.44191 48.97403 48.26370 47.99385 
     126       88       63       51      149       23       80      142 
47.43669 46.64301 46.08609 43.78419 43.35963 43.35131 42.87854 42.08205 
      60      114       65       86      137       14       85       19 
41.94065 41.20712 41.13602 40.77370 40.07189 39.89360 39.88596 39.70539 
      34      122      103       53      147       77       45       33 
39.57289 39.35464 39.34954 39.29469 37.06554 36.72283 36.65728 36.35472 
      82       78      111      145      104       57       74       73 
36.33810 36.27574 35.73859 35.68395 35.66022 35.64443 35.60047 35.52424 
      71       25      144      134       54      146      116       72 
35.46824 35.16641 35.14174 34.93118 34.85844 34.69579 34.61560 34.28195 
      91      133      138      112       66      140      150      125 
34.18726 33.99974 33.82144 33.59369 33.50269 33.38674 33.15171 33.09626 
     121       17       67      105      129       84      102      143 
32.99146 32.84282 32.80132 32.72743 32.32921 32.21023 31.86692 31.86692 
      81        6       75      117       52       37       98       62 
31.83292 31.83211 31.79915 31.78273 31.29275 30.85185 30.84422 30.79344 
     148       55       76      124       59      113        9       87 
30.71950 30.69285 30.64318 30.62567 30.53929 30.46482 30.36945 30.35838 
      68      141      127       44       21       64       79       56 
30.32139 30.15249 30.07458 29.94084 29.82010 29.79692 29.62591 29.60918 
     139       24      128       92       89       39       43       90 
29.52196 29.39869 29.20391 28.96927 28.80427 27.98737 27.97012 27.96153 
      96        7       70       32       83       93       36       95 
27.65555 27.25100 27.13429 26.99949 26.91009 26.25722 25.53690 24.78269 
      26       11       47       97      100       38       20       12 
24.46208 24.45217 24.08901 23.88099 23.22331 23.02984 22.84867 22.64447 
      22       49       27        3       46       30       13       48 
21.98652 21.78302 21.49228 21.17599 20.84591 20.53955 20.30874 20.24473 
       4        2       41       31       10       29       50        5 
20.14937 19.70533 19.63242 18.84399 18.62495 18.52104 18.27333 17.66272 
      28       35       40        8       18        1 
17.59896 17.20233 16.83494 16.54371 16.11287 15.53908 

DDoutlier documentation built on May 1, 2019, 10:20 p.m.