Description Usage Arguments Details Value Note Author(s) References See Also Examples
Performs an outlier detection on a given data frame/matrix.
1 2 3 4 5 6 7 8 9 10 11 12 | RWBP(x,...,nn_k,min.clusters,clusters.iterations,
clusters.stepSize,alfa,dumping.factor)
## Default S3 method:
RWBP(x,...,nn_k=10,min.clusters=8,clusters.iterations=6,
clusters.stepSize=2,alfa=0.5,dumping.factor=0.9)
## S3 method for class 'formula'
RWBP(formula,data,...,nn_k=10,min.clusters=8,clusters.iterations=6,
clusters.stepSize=2,alfa=0.5,dumping.factor=0.9)
## S3 method for class 'RWBP'
print(x, ...)
## S3 method for class 'RWBP'
plot(x, ...)
|
formula |
a formula representation of the problem (the dependent variable (y) will be ignored, the first two x attributes have to be spatial coordinates and the rest are numeric attributes) |
data |
a data frame containing the data to be analysed (may contain additional columns). |
x |
a data frame containing the data to be analysed. the first two columns must be spatial coordinates and the other columns are non-spatial attributes on which we search for outliers |
nn_k |
neighbourhood size (for finding each objects k nearest neighbours) |
min.clusters |
the number of clusters in the first clustering process |
clusters.iterations |
the number of clustering process to be conducted |
clusters.stepSize |
increase the amount of clusters in the following clustering process by this size |
alfa |
helps to compute more accurate edge value (distance between object and cluster) |
dumping.factor |
dumping factor (the probability to return to the original node during each step along a random walk) |
... |
currently not in use |
A spatial outlier detection approach based on RW techniques. A Bipartite graph is constructed based on the spatial and/or non-spatial attributes of the spatial objects in the dataset. Secondly, RW techniques are utilized on the graphs to compute the outlierness for each point (the differences between spatial objects and their spatial neighbours). The top k objects with higher outlierness are recognized as outliers.
Returns as RWBP object that contains several components:
data |
the data after removing records with empty fields |
X |
a data frame containing the spatial attributes(first two columns) from the input data |
Y |
a data frame containing the non-spatial attributes(all but the first two columns) from the input data |
ID |
a vector with sequential numbers, used as an index |
n |
number of valid records |
n.orig |
number of records accepted in the input data |
nn_k |
neighbourhood size for knn search |
k |
clusters amount in the first clustering process |
clusters.stepSize |
each next clustering process is increased by this size |
h |
number of conducted clustering processes |
alfa |
Helps to compute more accurate edge value (distance between object and cluster) |
c |
Dumping factor (the probability to return to the original node during each step along a random walk) |
nearest.indexes |
a matrix where each row contains a spatial object's nn_k nearest neighbours |
clusteredData |
a data frame containing the results of all clustering process: an object, the cluster it belongs to and the distance between the two |
igraph |
an igraph object built according to the connections between spatial objects and clusters |
OutScore |
the outlierness scores of each record, sorted ascending by score, the first column is the index of the record and the second column is the given score |
objects.similarity |
a matrix where each row holds the similarity between a spatial object and its nn_k neighbours |
First two columns must be spatial coordinates, and the rest of the columns must be numeric attributes. records with empty fields are removed from the input data.
Sigal Shaked & Ben Nasi
Liu X., Lu C.T., Chen F.: Spatial outlier detection: Random walk based approaches. In: Proceedings of the 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), San Jose, CA (2010).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | #an example dataset:
trainSet <- cbind(
c(7.092073,7.092631,7.09263,7.093052,7.092876,7.092689,7.092515,7.092321,
7.092138,7.11455,7.11441,7.11408,7.11376,7.11338,7.11305,7.11277,7.1124,
7.11202,7.11161,7.11115,7.11068,7.11014,7.10963,7.1095,7.1089,7.10818,
7.10747,7.10674,7.116691,7.116142,7.115559,7.115007,7.114423,7.113838,
7.113272,7.112684,7.112067,7.111458,7.110869,7.110274,7.109696,7.109131,
7.109231,7.108546,7.10797,5.599215,5.597609,5.596588,5.595359,5.594478,5.593652),
c(50.77849,50.77859,50.7786,50.77878,50.77914,50.77952,50.77992,50.78035,
50.78081,53.8,53.7,53.6,53.5,54.2,55.3,55.2,56.6,57.6,57.7,58.8,59.4,59.7,
59,59.03,59.3,60.7,60.8,61.4,50.73922,50.73914,50.73905,50.73899,50.73889,
50.73881,50.73873,50.73865,50.73856,50.73847,50.73838,50.73831,50.73822,
50.73814,50.73937,50.73805,50.73798,43.2034,43.20338,43.20352,43.2037,43.20391,43.20409),
c(106.5,107.6,25,108.5,109.1,109.7,111.6,113.3,113.3,62.3,333.7,331.5,327.2,
325.5,324.8,323.5,322.3,320.3,319,317.8,316,315.1,315.3,12,312.4,311.3,310.8,
309.4,99.2,99.2,101.1,99.5,101.3,105.3,104.3,104.4,106.3,108.8,110.3,111.7,113.3,
112.1,5000,111.6,109.8,125.6,130,132.3,133.4,138,143.4),
c(0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,0,0,0,0,0,0,0,0)
)
colnames(trainSet)<- c("lng","lat","alt","isOutlier")
#first to columns of the input data are assumed to be spatial coordinates,
#and the rest are non-spatial attributes according to which outliers will be extracted
myRW <- RWBP(as.data.frame(trainSet[,1:3]), clusters.iterations=6)
#predict classification:
testPrediction<-predict(myRW,3 )
#calculate accuracy:
sum(testPrediction$class==trainSet[,"isOutlier"])/nrow(trainSet)
#confusion table
table(testPrediction$class, trainSet[,"isOutlier"])
#other options:
myRW1 <- RWBP(isOutlier~lng+lat+alt, data=as.data.frame(trainSet))
#print model summary
print(myRW1)
#plot model graph
plot(myRW1)
#predict probabilities of each record to be an outlier:
predict(myRW1 , top_k=4,type="prob")
|
Loading required package: RANN
Loading required package: igraph
Attaching package: 'igraph'
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
Loading required package: lsa
Loading required package: SnowballC
[1] 0.9215686
0 1
0 46 2
1 2 1
A Random Walk on Bipartite Graph spatial outlier detection model was built:
----------------------------------------------------------------------------
neighberhood size = 10
initial clusters amount = 8
each process increases clusters amount by 2 more clusters
clusters iterations amount = 6
alfa = 0.5
dumping factor = 0.9
valid rows = 51 out of 51 input rows (records with empty values were removed)
a bipartite graph was built:
IGRAPH b066eb3 UNWB 129 306 --
+ attr: name (v/c), type (v/l), RW.Y (e/n), avgDist (e/n), weight (e/n)
+ edges from b066eb3 (vertex names):
[1] 1 ---4 2 ---4 3 ---8 4 ---4 5 ---3 6 ---3 7 ---3
[8] 8 ---3 9 ---3 10---1 11---7 12---7 13---7 14---7
[15] 15---7 16---7 17---7 18---7 19---7 20---7 21---7
[22] 22---7 23---7 24---8 25---7 26---7 27---7 28---7
[29] 29---5 30---5 31---5 32---5 33---5 34---4 35---4
[36] 36---4 37---4 38---4 39---3 40---3 41---3 42---3
[43] 43---6 44---3 45---3 46---2 47---2 48---2 49---2
[50] 50---2 51---2 1 ---1003 2 ---1006 3 ---1001 4 ---1006 5 ---1006
+ ... omitted several edges
outlier scores:
row_num outlierScore
[1,] 43 0.5748557
[2,] 10 0.6098816
[3,] 28 0.6627728
[4,] 39 0.6764593
[5,] 2 0.7598649
[6,] 3 0.8966423
[7,] 45 0.9125781
[8,] 38 0.9138153
[9,] 37 0.9138919
[10,] 42 0.9167803
[11,] 44 0.9169039
[12,] 40 0.9169214
[13,] 41 0.9171035
[14,] 1 0.9210805
[15,] 6 0.9214066
[16,] 5 0.9241858
[17,] 4 0.9244069
[18,] 7 0.9247118
[19,] 8 0.9249372
[20,] 9 0.9249372
[21,] 12 0.9440315
[22,] 11 0.9442541
[23,] 24 0.9463678
[24,] 16 0.9470852
[25,] 15 0.9473674
[26,] 13 0.9476015
[27,] 14 0.9479852
[28,] 26 0.9531505
[29,] 27 0.9552595
[30,] 25 0.9566898
[31,] 22 0.9568375
[32,] 21 0.9568599
[33,] 31 0.9630348
[34,] 33 0.9630902
[35,] 36 0.9638831
[36,] 34 0.9639474
[37,] 35 0.9641352
[38,] 32 0.9641847
[39,] 29 0.9642799
[40,] 30 0.9642799
[41,] 47 0.9901509
[42,] 51 0.9912388
[43,] 50 0.9913511
[44,] 49 0.9929033
[45,] 19 0.9937413
[46,] 17 0.9950227
[47,] 20 0.9953359
[48,] 23 0.9959121
[49,] 48 0.9961007
[50,] 46 0.9966364
[51,] 18 0.9969489
lng lat alt prob
1 7.092073 50.77849 106.5 0.1797432472
2 7.092631 50.77859 107.6 0.5616863794
3 7.092630 50.77860 25.0 0.2376407886
4 7.093052 50.77878 108.5 0.1718625080
5 7.092876 50.77914 109.1 0.1723863642
6 7.092689 50.77952 109.7 0.1789706541
7 7.092515 50.77992 111.6 0.1711401056
8 7.092321 50.78035 113.3 0.1706061464
9 7.092138 50.78081 113.3 0.1706061464
10 7.114550 53.80000 62.3 0.9170185843
11 7.114410 53.70000 333.7 0.1248415747
12 7.114080 53.60000 331.5 0.1253689921
13 7.113760 53.50000 327.2 0.1169110349
14 7.113380 54.20000 325.5 0.1160019481
15 7.113050 55.30000 324.8 0.1174656829
16 7.112770 55.20000 323.5 0.1181343974
17 7.112400 56.60000 322.3 0.0045634316
18 7.112020 57.60000 320.3 0.0000000000
19 7.111610 57.70000 319.0 0.0075991271
20 7.111150 58.80000 317.8 0.0038213684
21 7.110680 59.40000 316.0 0.0949766667
22 7.110140 59.70000 315.1 0.0950297545
23 7.109630 59.00000 315.3 0.0024563868
24 7.109500 59.03000 12.0 0.1198338045
25 7.108900 59.30000 312.4 0.0953795670
26 7.108180 60.70000 311.3 0.1037646878
27 7.107470 60.80000 310.8 0.0987681092
28 7.106740 61.40000 309.4 0.7917117025
29 7.116691 50.73922 99.2 0.0773976472
30 7.116142 50.73914 99.2 0.0773976472
31 7.115559 50.73905 101.1 0.0803474077
32 7.115007 50.73899 99.5 0.0776231584
33 7.114423 50.73889 101.3 0.0802161353
34 7.113838 50.73881 105.3 0.0781853664
35 7.113272 50.73873 104.3 0.0777404339
36 7.112684 50.73865 104.4 0.0783375265
37 7.112067 50.73856 106.3 0.1967740167
38 7.111458 50.73847 108.8 0.1969555585
39 7.110869 50.73838 110.3 0.7592862845
40 7.110274 50.73831 111.7 0.1895966810
41 7.109696 50.73822 113.3 0.1891652295
42 7.109131 50.73814 112.1 0.1899311112
43 7.109231 50.73937 5000.0 1.0000000000
44 7.108546 50.73805 111.6 0.1896381860
45 7.107970 50.73798 109.8 0.1998865257
46 5.599215 43.20340 125.6 0.0007403103
47 5.597609 43.20338 130.0 0.0161054345
48 5.596588 43.20352 132.3 0.0020094941
49 5.595359 43.20370 133.4 0.0095846352
50 5.594478 43.20391 138.0 0.0132618493
51 5.593652 43.20409 143.4 0.0135280699
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.