knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(emitters)
{emitters} implements a data abstraction known in many languages as a 'stream'. Streams allow for the reading and writing of data in chunks, in a similar vein to the base readLines
and writeLines
functions. The main difference is that streams allow for the buffering of data, so that system memory isn't taken up by operating on huge data sources.
Creating a stream is relatively straightforward. We'll start by creating a writeable stream.
# The first argument to the WriteableStream is the destination # i.e. where the data should go/be stored stream <- WriteableStream$new('file.txt')
We can then send some data to the list via the stream's write
method:
stream$write(1) stream$write(2) stream$write(3)
We can then check the destination of the stream:
readLines('file.txt') #> "1" "2" "3"
So far, this form of writing data doesn't seem any different to a call to writeLines
. The real difference comes in when working with large amounts of data!
Streams are intended to reduce memory usage when working with large amounts of data, the general idea being memory performant over time performant. To this end, all streams are created with a highwater_mark
, a numeric vector representing when the stream will stop automatically writing data to the specified location. This is a little tricky to get your head around, so let's try an example.
First, we'll create another writeable stream, this time setting a highwater mark:
stream <- WriteableStream$new( list(), highwater_mark = 10 )
In this case, the highwater mark of 10
means that the stream will stop sending data when it has 10 or more bytes of data stored in its buffer. To see what happens when the stream buffer is 'full', we can cork the stream to prevent writing data
stream$cork() # cork to prevent data being written stream$write(1) stream$write(2) stream$write(3)
and we can check the stream's state:
stream$writeable_state #> $encoding #> [1] "UTF8" #> #> $highwater_mark #> [1] 10 #> #> $source #> <environment: 0x561e1b5533a0> #> #> $buffer #> <environment: 0x561e1a234b68> #> #> $buffer_length #> 64 bytes #> #> $flowing #> [1] FALSE #> #> $corked #> [1] 1 #> #> $ended #> [1] FALSE #> #> $destroyed #> [1] FALSE #>
You may notice that the buffer contains more bytes (64) than specified by the highwater mark (10). This is because the highwater mark is not a hard limit, but instead a way of telling the stream to stop reading when it has reached (or passed) the highwater mark. In this case, the the value 1
was roughly 64 bytes, thereby passing the highwater mark and causing values 2
and 3
to not be written to stream the destination.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.