pkg <- 'stream'

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg)

Introduction

The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package provides:

Additional packages in the stream family are:

pkg_citation(pkg, 2)
pkg_install(pkg)

Usage

options(digits = 3)

Load the package and a random data stream with 3 Gaussian clusters and 10\% noise and scale the data to z-scores.

library("stream")
set.seed(2000)

stream <- DSD_Gaussians(k = 3, d = 2, noise= .1) %>% DSF_Scale()
get_points(stream, n = 5)

plot(stream)

Cluster a stream of 1000 points using D-Stream which estimates point density in grid cells.

dsc <- DSC_DStream(gridsize = .1)
update(dsc, stream, 1000)
plot(dsc, stream, grid = TRUE)
evaluate_static(dsc, stream, n = 100)

Outlier detection using DBSTREAM which uses micro-clusters with a given radius.

dso <- DSOutlier_DBSTREAM(r = .1)
update(dso, stream, 1000)
plot(dso, stream)
evaluate_static(dso, stream, n = 100, measure = c("numPoints", "noiseActual", "noisePredicted", "noisePrecision"))

Preparing complete stream process pipelines that can be run using a single update() call.

pipeline <- DSD_Gaussians(k = 3, d = 2, noise= .1) %>% 
  DSF_Scale() %>% 
  DST_Runner(DSC_DStream(gridsize = .1))
pipeline

update(pipeline, n = 500)
pipeline$dst

Acknowledgments

The development of the stream package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912.

References



mhahsler/stream documentation built on April 24, 2024, 10:10 p.m.