README.md

R-streamgenerator

A statistical muldi-dimensional stream generator for benchmarking stream mining algorithms.

This R package provides functions to generate multidimensional data streams where the correlation structure can change through time.

More information about the motivation of this project is available in this article

How it works

Note that the overall proportion of outlier in the output stream does not relate directly to prop. Since prop corresponds either to the absolute expected proportion of outlier per subspace (proptype = "absolute"), or the expected proportion of outlier conditioned on the size of the hidden space (proptype = "proportional"). In both cases, it depends on the number of dependent subspaces.

For each stream, contrasted subspaces are choosen such that roughly 1/4 of the dimensions are not involved in a contrasted subspaces. This is done so to make the search for subspace realistic and to let the possibility for subspaces to change over time.

TL;DR the data is generated uniformly, except in some subspaces where the data is concentrated in the shape of particular dependencies. The choosen dependencies include regions to place hidden outliers. The dependencies are susceptible to change in amplitude and subspaces through time. The following picture shows a snapshot of 100 points in a subspace with a dependency of type "Wall" in a generated data stream at different points in time:

streamgenerator_1

As you can see, the relationship between attribute n°7 and n°8 has evolved through time. Also, there is an outlier in picture 5 (in red). Obviously, by looking at the whole time window, this point would probably not have been detected as such.

Currently, 3 kinds of dependencies are available, "Wall", "Square" and "Donut". Here is what they look like in 2-D spaces, with n = 1000, margin=0.8, prop=0.005, proptype="absolute". Outliers are showed in red.

With discrete=0:

dependencies_real

With discrete=20:

dependencies_discrete

Install

  1. Install dev-tools:
install.packages("devtools")
  1. Install the package from github
library(devtools)
devtools::install_github("edouardfouche/R-streamgenerator")

Development mode

  1. Clone this repository
  2. Load the package in your environment:
library(devtools)
load_all("~/path/to/cloned/R-streamgenerator/")

Note that the package is not published to CRAN (yet).

Package documentation

Documentation (.Rd files) for this package was created by using roxygen2 package.

  1. Install devtool package as it is shown above (Install 1.&2.), further install package roxygen2:
install.packages("roxygen2")

Roxygen2 format required special comments which should be started with #'

  1. After creation this comments press Ctrl/Cmd + Shift + D or run:
document()

a man/NameOfFunction.Rd will be generated.

  1. For using created documentation run:
?NameOfFunction

or

help("NameOfFunction")

Within R-Streamgenerator documentation was created for following functions:

generate.subspaces()
replace.subspaces()
generate.margins() 
generate.marginslist()
generate.dynamic()
generate.stream.config()
generate.row()
generate.multiple.rows()
generate.static.stream()
generate.dynamic.stream()
output.stream()

Get started

stream <- generate.static.stream() # default parameters
# Generate a stream with custom configuration
stream.config <- generate.stream.config(dim=50, nstep=1) # nstep should be = 1
stream <- generate.static.stream(n=1000, prop=0.05, stream.config=stream.config)
stream <- generate.dynamic.stream() # default parameters
# Generate a stream with custom configuration
stream.config <- generate.stream.config(dim=50, nstep=10, volatility=0.5)
stream <- generate.dynamic.stream(n=100, prop=0.05, stream.config=stream.config)
# This will create 3 files in your working directory. 
# "example_data.txt" contains the stream
# "example_labels.txt" contains the labels, i.e if each point is an outlier and in which subspace(s)
# "example_description.txt" contains a human-readable description of the stream 
output.stream(stream, "example")

TODO(s)



edouardfouche/R-streamgenerator documentation built on May 15, 2019, 11:02 p.m.