DSD_Memory: A Data Stream Interface for Data Stored in Memory

View source: R/DSD_Memory.R

DSD_MemoryR Documentation

A Data Stream Interface for Data Stored in Memory

Description

This class provides a data stream interface for data stored in memory as matrix-like objects (including data frames). All or a portion of the stored data can be replayed several times.

Usage

DSD_Memory(
  x,
  n,
  k = NA,
  outofpoints = c("warn", "ignore", "stop"),
  loop = FALSE,
  description = NULL
)

Arguments

x

A matrix-like object containing the data. If x is a DSD object then a data frame for n data points from this DSD is created.

n

Number of points used if x is a DSD object. If x is a matrix-like object then n is ignored.

k

Optional: The known number of clusters in the data

outofpoints

Action taken if less than n data points are available. The default is to return the available data points with a warning. Other supported actions are:

  • warn: return the available points (maybe an empty data.frame) with a warning.

  • ignore: silently return the available points.

  • stop: stop with an error.

loop

Should the stream start over when it reaches the end?

description

character string with a description.

Details

In addition to regular data.frames other matrix-like objects that provide subsetting with the bracket operator can be used. This includes ffdf (large data.frames stored on disk) from package ff and big.matrix from bigmemory.

Reading the whole stream By using n = -1 in get_points(), the whole stream is returned.

Value

Returns a DSD_Memory object (subclass of DSD_R, DSD).

Author(s)

Michael Hahsler

See Also

Other DSD: DSD_BarsAndGaussians(), DSD_Benchmark(), DSD_Cubes(), DSD_Gaussians(), DSD_MG(), DSD_Mixture(), DSD_NULL(), DSD_ReadDB(), DSD_ReadStream(), DSD_Target(), DSD_UniformNoise(), DSD_mlbenchData(), DSD_mlbenchGenerator(), DSD(), DSF(), animate_data(), close_stream(), get_points(), plot.DSD(), reset_stream()

Examples

# Example 1: store 1000 points from a stream
stream <- DSD_Gaussians(k = 3, d = 2)
replayer <- DSD_Memory(stream, k = 3, n = 1000)
replayer
plot(replayer)

# creating 2 clusterers of different algorithms
dsc1 <- DSC_DBSTREAM(r = 0.1)
dsc2 <- DSC_DStream(gridsize = 0.1, Cm = 1.5)

# clustering the same data in 2 DSC objects
reset_stream(replayer) # resetting the replayer to the first position
update(dsc1, replayer, 500)
reset_stream(replayer)
update(dsc2, replayer, 500)

# plot the resulting clusterings
reset_stream(replayer)
plot(dsc1, replayer, main = "DBSTREAM")
reset_stream(replayer)
plot(dsc2, replayer, main = "D-Stream")


# Example 2: use a data.frame to create a stream (3rd col. contains the assignment)
df <- data.frame(x = runif(100), y = runif(100),
  .class = sample(1:3, 100, replace = TRUE))

# add some outliers
out <- runif(100) > .95
df[['.outlier']] <- out
df[['.class']] <- NA
head(df)

stream <- DSD_Memory(df)
stream

reset_stream(stream)
get_points(stream, n = 5)

# get the remaining points
rest <- get_points(stream, n = -1)
nrow(rest)

# plot all available points with n = -1
reset_stream(stream)
plot(stream, n = -1)

mhahsler/stream documentation built on July 30, 2023, 12:09 a.m.