In ElianHugh/emitters: NodeJS-inspired events and io streams.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(emitters)

{emitters} implements a data abstraction known in many languages as a 'stream'. Streams allow for the reading and writing of data in chunks, in a similar vein to the base readLines and writeLines functions. The main difference is that streams allow for the buffering of data, so that system memory isn't taken up by operating on huge data sources.

Creating a stream is relatively straightforward. We'll start by creating a writeable stream.

# The first argument to the WriteableStream is the destination
# i.e. where the data should go/be stored
stream <- WriteableStream$new('file.txt')

We can then send some data to the list via the stream's write method:

stream$write(1)
stream$write(2)
stream$write(3)

We can then check the destination of the stream:

readLines('file.txt')

#> "1" "2" "3"

So far, this form of writing data doesn't seem any different to a call to writeLines. The real difference comes in when working with large amounts of data!

The highwater mark

Streams are intended to reduce memory usage when working with large amounts of data, the general idea being memory performant over time performant. To this end, all streams are created with a highwater_mark, a numeric vector representing when the stream will stop automatically writing data to the specified location. This is a little tricky to get your head around, so let's try an example.

First, we'll create another writeable stream, this time setting a highwater mark:

stream <- WriteableStream$new(
  list(),
  highwater_mark = 10
)

In this case, the highwater mark of 10 means that the stream will stop sending data when it has 10 or more bytes of data stored in its buffer. To see what happens when the stream buffer is 'full', we can cork the stream to prevent writing data

stream$cork() # cork to prevent data being written
stream$write(1)
stream$write(2)
stream$write(3)

and we can check the stream's state:

stream$writeable_state
#> $encoding
#> [1] "UTF8"
#>
#> $highwater_mark
#> [1] 10
#>
#> $source
#> <environment: 0x561e1b5533a0>
#>
#> $buffer
#> <environment: 0x561e1a234b68>
#>
#> $buffer_length
#> 64 bytes
#>
#> $flowing
#> [1] FALSE
#>
#> $corked
#> [1] 1
#>
#> $ended
#> [1] FALSE
#>
#> $destroyed
#> [1] FALSE
#>

You may notice that the buffer contains more bytes (64) than specified by the highwater mark (10). This is because the highwater mark is not a hard limit, but instead a way of telling the stream to stop reading when it has reached (or passed) the highwater mark. In this case, the the value 1 was roughly 64 bytes, thereby passing the highwater mark and causing values 2 and 3 to not be written to stream the destination.

ElianHugh/emitters documentation built on Feb. 6, 2022, 4:55 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com