KNN_IN: In-degree for observations in a k-nearest neighbors graph

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to calculate in-degree as an outlier score for observations, given a k-nearest neighbors graph. Suggested by Hautamaki, V., & Ismo, K. (2004)

Usage

1
KNN_IN(dataset, k = 5)

Arguments

dataset

The dataset for which observations have an in-degree returned

k

The number of k-nearest neighbors to construct a graph with. Has to be smaller than the number of observations in dataset

Details

KNN_IN computes the in-degree, being the number of reverse neighbors. For computing the in-degree, a k-nearest neighbors graph is computed. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package. The KNN_IN function is useful for outlier detection in clustering and other multidimensional domains.

Value

A vector of in-degree for observations. The smaller the in-degree, the greater outlierness

Author(s)

Jacob H. Madsen

References

Hautamaki, V., & Ismo, K. (2004). Outlier Detection Using k-Nearest Neighbour Graph. In International Conference on Pattern Recognition. Cambridge, UK. pp. 430-433. DOI: 10.1109/ICPR.2004.1334558

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Create dataset
X <- iris[,1:4]

# Find outliers by setting an optional k
outlier_score <- KNN_IN(dataset=X, k=10)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = FALSE)

# Inspect the distribution of outlier scores
hist(outlier_score)

Example output

 23  42 107 135  16  25  45  58  61  63  69  86  88  94  99 149  14  15  19  36 
  0   0   0   1   2   2   3   3   3   3   3   3   3   3   3   3   4   4   4   4 
 44  85 101 109 115 119 142   9  51 110 118 132  21  24  32  33  34  39  53  65 
  4   4   4   4   4   4   4   5   5   5   5   5   6   6   6   6   6   6   6   6 
 71 114 120 137  91 106 122 123   6   7  12  17  26  37  72  74  77  78  80 104 
  6   6   6   6   7   7   7   7   8   8   8   8   8   8   8   8   8   8   8   8 
130 143 146   2  43  57  60  66  67 111 136 145  10  27  38  75  87 102 108 131 
  8   8   8   9   9   9   9   9   9   9   9   9  10  10  10  10  10  10  10  10 
147  47  59  73  76  96 126 129 150  48  52  62  82  89  98 112 116 138 144  22 
 10  11  11  11  11  11  11  11  11  12  12  12  12  12  12  12  12  12  12  13 
 29  54  56  81 133 140   5  11  20  31  35  64  68  79  92 124   3   4  70  84 
 13  13  13  13  13  13  14  14  14  14  14  14  14  14  14  14  15  15  15  15 
 97 103 105 117 134 148  13  30  41  46  49  83 128 139   8  18  50  93  95 125 
 15  15  15  15  15  15  16  16  16  16  16  16  16  16  17  17  17  17  17  17 
141  40  90 121 127   1  28  55 100 113 
 17  18  18  18  18  19  19  19  19  21 

DDoutlier documentation built on May 1, 2019, 10:20 p.m.