Streamer-package: Package to enable stream (iterative) processing of large data

Description Details Author(s) See Also Examples

Description

Large data files can be difficult to work with in R, where data generally resides in memory. This package encourages a style of programming where data is 'streamed' from disk into R through a series of components that, typically, reduce the original data to a manageable size. The package provides useful Producer and Consumer components for operations such as data input, sampling, indexing, and transformation.

Details

The central paradigm in this package is a Stream composed of a Producer and zero or more Consumer components. The Producer is responsible for input of data, e.g., from the file system. A Consumer accepts data from a Producer and performs transformations on it. The Stream function is used to assemble a Producer and zero or more Consumer components into a single string.

The yield function can be applied to a stream to generate one ‘chunk’ of data. The definition of chunk depends on the stream and its components. A common paradigm repeatedly invokes yield on a stream, retrieving chunks of the stream for further processing.

Author(s)

Martin Morgan mtmorgan@fhcrc.org

See Also

Producer, Consumer are the main types of stream components. Use Stream to connect components, and yield to iterate a stream.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## About this package
packageDescription("Streamer")

## Existing stream components
getClass("Producer")		# Producer classes
getClass("Consumer")            # Consumer classes

## An example
fl <- system.file("extdata", "s_1_sequence.txt", package="Streamer")
b <- RawInput(fl, 100L, reader=rawReaderFactory(1e4))
s <- Stream(RawToChar(), Rev(), b)
s
head(yield(s)) 			# First chunk
close(b)

b <- RawInput(fl, 5000L, verbose=TRUE)
d <- Downsample(sampledSize=50)
s <- Stream(RawToChar(), d, b)
s
s[[2]]

## Processing the first ten chunks of the file
i <- 1
while (10 >= i && 0L != length(chunk <- yield(s)))
{
   cat("chunk", i, "length", length(chunk), "\n")
   i <- i + 1
}
close(b)

Streamer documentation built on Nov. 8, 2020, 5:53 p.m.