RKOF: Robust Kernel-based Outlier Factor (RKOF) algorithm with...

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to to calculate the RKOF score for observations as suggested by Gao, J., Hu, W., Zhang, X. & Wu, Ou. (2011)

Usage

1
RKOF(dataset, k = 5, C = 1, alpha = 1, sigma2 = 1)

Arguments

dataset

The dataset for which observations have an RKOF score returned

k

The number of nearest neighbors to compare density estimation with

C

Multiplication parameter for k-distance of neighboring observations. Act as bandwidth increaser. Default is 1 such that k-distance is used for the gaussian kernel

alpha

Sensivity parameter for k-distance/bandwidth. Small alpha creates small variance in RKOF and vice versa. Default is 1

sigma2

Variance parameter for weighting of neighboring observations

Details

RKOF computes a kernel density estimation by comparing density estimation to the density of neighboring observations. A gaussian kernel is used for density estimation, given a bandwidth with k-distance. K-distance can be influenced with the parameters C and alpha. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package. The RKOF function is useful for outlier detection in clustering and other multidimensional domains

Value

A vector of RKOF scores for observations. The greater the RKOF score, the greater outlierness

Author(s)

Jacob H. Madsen

References

Gao, J., Hu, W., Zhang, X. & Wu, Ou. (2011). RKOF: Robust Kernel-Based Local Outlier Detection. Pacific-Asia Conference on Knowledge Discovery and Data Mining: Advances in Knowledge Discovery and Data Mining. pp. 270-283. DOI: 10.1007/978-3-642-20847-8_23

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Create dataset
X <- iris[,1:4]

# Find outliers by setting an optional k
outlier_score <- RKOF(dataset=X, k = 10, C = 1, alpha = 1, sigma2 = 1)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)

Example output

        42         23        107         16         99         61         25 
20.9333707  7.4803071  7.4337297  4.2156385  3.7606731  3.5099076  3.4677519 
        14         94         15         58        110         44         45 
 3.3511778  3.0964169  3.0380271  2.9927734  2.9165486  2.4841597  2.4062123 
        33        132         34        118        115         63        135 
 2.3947587  2.3866664  2.2960100  2.2860929  2.2699917  2.2671660  2.2501356 
        24        109         37         60         69         21        119 
 2.2397256  2.2301521  2.2017995  2.1639492  2.1555626  2.1472660  2.0711909 
         9        130         65         80         19         88         85 
 2.0597701  2.0161673  1.9891042  1.9824328  1.9757990  1.8843072  1.8802384 
       101         32         36        120        126         39          7 
 1.8012098  1.7902986  1.7407341  1.6874210  1.6388187  1.6131324  1.6050853 
        17         86        136        149        123         43         51 
 1.5556259  1.5462244  1.5344901  1.5174920  1.4926101  1.4462593  1.4299705 
        26         38         72        142         67          6         91 
 1.4237754  1.3941024  1.3842360  1.3641326  1.3621389  1.3601984  1.3457786 
       108         27         47         12        122         54        114 
 1.3356308  1.3303128  1.3022165  1.2978009  1.2969380  1.2951479  1.2942641 
        20         53        103        137        131         82        106 
 1.2846954  1.2834857  1.2695735  1.2597446  1.2588270  1.2337204  1.2239498 
        49         22         57         78         68         71         74 
 1.2239461  1.2153137  1.2107051  1.1837874  1.1546670  1.1468838  1.1253193 
       147         62         81        134         77        111        116 
 1.1195289  1.1189868  1.1187405  1.1031833  1.0960186  1.0850111  1.0754110 
        56         73         41          3        146        133        145 
 1.0739931  1.0721847  1.0604453  1.0523797  1.0418085  1.0377909  1.0374178 
       138         55         75        112         11        104         84 
 1.0307016  1.0297730  1.0290214  1.0258033  1.0234543  1.0225895  1.0170707 
       105         46        102        143         98        125         50 
 1.0106365  1.0091586  1.0077600  1.0077600  1.0032409  1.0032110  0.9999347 
        89         52          5         66         79         70         83 
 0.9882766  0.9826802  0.9773547  0.9750188  0.9749555  0.9747913  0.9705611 
       150        140         13         90         48         29         64 
 0.9684244  0.9683627  0.9604432  0.9586856  0.9585964  0.9562693  0.9536025 
       113         30        144         28         96        129         59 
 0.9518954  0.9510152  0.9481561  0.9473663  0.9468100  0.9435002  0.9431162 
       117        139          4         10          2        127        124 
 0.9421072  0.9406396  0.9402717  0.9222534  0.9173107  0.9171001  0.9118251 
       148          8         92        141         87         93         76 
 0.9112023  0.9088143  0.9044098  0.9043231  0.8994932  0.8939278  0.8929194 
        31         95        128        121         18         35         40 
 0.8886043  0.8746669  0.8739067  0.8714953  0.8644323  0.8584679  0.8284067 
        97          1        100 
 0.8256756  0.7918447  0.7866364 

DDoutlier documentation built on May 1, 2019, 10:20 p.m.