LDF: Local Density Factor (LDF) algorithm with gaussian kernel

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to calculate a Local Density Estimate (LDE) and Local Density Factor (LDF), as an outlier score, with a gaussian kernel. Suggested by Latecki, L., Lazarevic, A. & Prokrajac, D. (2007)

Usage

1
LDF(dataset, k = 5, h = 1, c = 1)

Arguments

dataset

The dataset for which observations have an LDE and LDF score returned

k

The number of k-nearest neighbors to compare density estimation with. k has to be smaller than number of observations in dataset

h

User-given bandwidth for kernel functions. The greater the bandwidth, the smoother kernels and lesser weight are put on outliers. Default is 1

c

Scaling constant for comparison of LDE to neighboring observations. LDF is the comparison of average LDE for an observation and its neighboring observations. Thus, c=1 gives results in an LDF between 0 and 1, while c=0 can result in very large or infinite values of LDF. Default is 1

Details

LDF computes a kernel density estimation, called LDE, over a user-given number of k-nearest neighbors. The LDF score is the comparison of Local Density Estimate (LDE) for an observation to its neighboring observations. Naturally, if an observation has a greater LDE than its neighboring observations, it has no outlierness whereas an observation with smaller LDE than its neighboring observations has great outlierness. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package. The LDF function is useful for outlier detection in clustering and other multidimensional domains

Value

LDE

A vector of Local Density Estimate for observations. The greater the LDE, the greater centrality

LDF

A vector of Local Density Factor for observations. The greater the LDF, the greater the outlierness

Author(s)

Jacob H. Madsen

References

Latecki, L., Lazarevic, A. & Prokrajac, D. (2007). Outlier Detection with Kernel Density Functions. International Workshop on Machine Learning and Data Mining in Pattern Recognition: Machine Learning and Data Mining in Pattern Recognition. pp. 61-75. DOI: 10.1007/978-3-540-73499-4_6

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Create dataset
X <- iris[,1:4]

# Find outliers by setting an optional range of k's
outlier_score <- LDF(dataset=X, k=10, h=2, c=1)$LDF

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)

Example output

       58        16        42        99        15       132       118       119 
0.7276217 0.7254735 0.7228736 0.7171022 0.7121239 0.7083509 0.7060269 0.7043789 
       94        61       123       106       136        34       131       108 
0.6895826 0.6888589 0.6827138 0.6804717 0.6691665 0.6676980 0.6647350 0.6639056 
       19        45       107        17        23         6        11        33 
0.6600106 0.6548252 0.6538390 0.6310541 0.6256664 0.6251220 0.6100369 0.6056570 
       14        88        69        63        82       120       110        47 
0.6010961 0.5934501 0.5809000 0.5788718 0.5774531 0.5683943 0.5676850 0.5646755 
      130       126        32       135        87        39        43        37 
0.5646600 0.5637191 0.5635693 0.5609338 0.5587530 0.5582367 0.5581898 0.5571707 
       80       115       114        65        21        24       101        81 
0.5553393 0.5542288 0.5531827 0.5523997 0.5512975 0.5491356 0.5394798 0.5385490 
       44       137        25        77         9         7       109        53 
0.5374027 0.5353577 0.5353295 0.5347614 0.5341326 0.5324061 0.5312280 0.5307166 
       22        51         3        73       103       122       148       102 
0.5293764 0.5293138 0.5262312 0.5260957 0.5255984 0.5255297 0.5236574 0.5225730 
      143        71        49        90       141         4        70        98 
0.5225730 0.5224128 0.5204007 0.5189761 0.5185947 0.5177519 0.5176837 0.5166334 
       86        20       145        12        62        52        66       112 
0.5161136 0.5156641 0.5152552 0.5138377 0.5131981 0.5131650 0.5131457 0.5128695 
        8       128        75        31        60        85       149        54 
0.5119967 0.5115152 0.5114274 0.5111306 0.5107272 0.5104499 0.5103985 0.5099480 
      105       150        56        84       111       121        67         2 
0.5080799 0.5075490 0.5072416 0.5071443 0.5066495 0.5063696 0.5056844 0.5054519 
       48       147        95       144         5        92        97        72 
0.5054358 0.5048213 0.5047479 0.5046361 0.5033983 0.5032228 0.5028540 0.5010669 
      146       104       125        27        35        74        78       116 
0.5006804 0.5004960 0.5004622 0.5003949 0.4997283 0.4993634 0.4986958 0.4978644 
      139       134        91       142        18        13        46        10 
0.4972717 0.4969446 0.4959750 0.4950688 0.4945860 0.4938775 0.4938775 0.4937073 
       59        76       127        64       140       117       138        79 
0.4924629 0.4921091 0.4916968 0.4913452 0.4910859 0.4906247 0.4897488 0.4897391 
       50        30        57        36       113         1        83        40 
0.4888637 0.4876497 0.4866397 0.4863480 0.4858641 0.4851776 0.4848402 0.4840054 
       93        28        96       100        41        89        55        38 
0.4832600 0.4828004 0.4820199 0.4808723 0.4802692 0.4800804 0.4791818 0.4782170 
       26        29       133       129       124        68 
0.4777825 0.4772532 0.4769265 0.4719941 0.4718194 0.4690840 

DDoutlier documentation built on May 1, 2019, 10:20 p.m.