README.md

fpmoutliers

Build Status

R implementation of algorithms for detection of outliers based on frequent pattern mining.

If you would like to cite our work, please use:

@InProceedings{kuchar:2017:FPI,
  title =    {Spotlighting Anomalies using Frequent Patterns},
  author =   {Jaroslav Kuchař and Vojtěch Svátek},
  booktitle =    {Proceedings of the KDD 2017 Workshop on Anomaly Detection in Finance},
  year =   {2017},
  volume =   {71},
  series =   {Proceedings of Machine Learning Research},
  address =    {Halifax, Nova Scotia, Canada},
  month =    {14 Aug},
  publisher =    {PMLR},
  issn = {1938-7228}
}

Available implementations:

Development Version Installation

Package installation from GitHub:

library("devtools")
devtools::install_github("jaroslav-kuchar/fpmoutliers")

Usage

Basic example

library(fpmoutliers)
dataFrame <- read.csv(system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]
print(dataFrame[1,]) # instance with the highest anomaly score
print(dataFrame[nrow(dataFrame),]) # instance with the lowest anomaly score

Experimental explanations

Graphical explanation using bar plots

Currently not suitable for large datasets - the plot is limited by the number of rows and columns of the input data.

library("fpmoutliers")
dataFrame <- read.csv(
     system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
# sort data by the anomaly score
dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]
visualizeInstance(dataFrame, 1) # instance with the highest anomaly score
visualizeInstance(dataFrame, nrow(dataFrame)) # instance with the lowest anomaly score

Textual explanation

library("fpmoutliers")
dataFrame <- read.csv(
     system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
# sort data by the anomaly score
dataFrame <- dataFrame[order(model$scores, decreasing = TRUE),]
# instance with the highest anomaly score
out <- describeInstance(dataFrame, model, 1)
# instance with the lowest anomaly score
out <- describeInstance(dataFrame, model, nrow(dataFrame))

Other available functionalities

Experimental automatic build

library("fpmoutliers")
data("iris")
model <- fpmoutliers::build(iris)

Save the model to an experimental PMML format

library(fpmoutliers)
library(XML)
dataFrame <- read.csv(system.file("extdata", "fp-outlier-customer-data.csv", package = "fpmoutliers"))
model <- FPI(dataFrame, minSupport = 0.001)
saveXML(generatePMML(model, dataFrame), "example_out.xml")

Model Output

All implemented methods return a list with following parameters: - minSupport - minimum support setting for frequent itemsets mining - maxlen - maximum length of frequent itemsets - model - frequent itemset model represented as itemsets-class - scores - outlier/anomaly scores for each observation/row of the input dataframe

Contributors

Licence

Apache License Version 2.0



Try the fpmoutliers package in your browser

Any scripts or data that you put into this service are public.

fpmoutliers documentation built on May 2, 2019, 8:53 a.m.