intro/bitjson-intro.md

bitjson-intro

2017-07-10

Why

Just wanted to un/marshal R objects from/to a text representation that preserves data consistency, can be sent over a wire and dumped anywhere.

Examples

Data consistency

# marshal to bit JSON
nilebits <- bitjson::toBitJSON(datasets::Nile)

# unmarshal from bit JSON
nile <- bitjson::fromBitJSON(nilebits)

# marshaled still consistent
cat('consistent:', identical(datasets::Nile, nile))
consistent: TRUE

IO

bitjson::toBitJSON allows writing bitjson arrays directly to disk by making use of parameter file. Since bitjson depends on jsonlite for conversion between JSON arrays and R integer vectors it inherits jsonlite's powerful IO features. Therefore, bitjson::fromBitJSON can unmarshal from a file, url or in-memory JSON string.

# write to disk
bitjson::toBitJSON(datasets::islands, file='islands.json')

# read from disk
inlands <- bitjson::fromBitJSON('islands.json')

# after io roundtrip
cat('consistent via disk:', identical(datasets::islands, inlands))
consistent via disk: TRUE

Data format

bitjson uses numeric JSON arrays as underlying data structure. A bitjson array contains either zeros and ones exclusively (uncompressed) or a sequence of unsigned integers (compressed). In either case it is valid JSON.

bitjson::toBitJSON applies compression by default; toggleable via parameter compress. Similarly bitjson::fromBitJSON expects a compressed bit JSON array by default, which likewise can be toggled via parameter compressed. Better to use compression though.

Uncompressed bit JSON

# just a demo - do use compression 
xl <- bitjson::toBitJSON(419L, compress=FALSE)

# uncompressed xl bit JSON array
cat('uncompressed:\n', xl, sep='')
uncompressed:
[0,0,0,1,1,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,1,0,1]

Compressed bit JSON

# parameter compress defaults to TRUE
xs <- bitjson::toBitJSON(419L)

# compressed bit JSON array
cat('compressed:\n', xs, sep='')
compressed:
[3,0,2,1,0,1,2,0,1,0,1,29,0,1,14,0,2,1,6,0,2,1,7,0,1,15,0,1,6,0,2,1,38,0,1,0,2,1,28,0,1,23,0,1,7,0,2,1,3,0,1,0,1]

Compression

Since bit arrays can get rather vast, bitjson uses a simple de/compression approach that grounds on run-length encoding. A notable property of the applied compression algorithm is zero encoding overhead, meaning the compressed array will in no case be longer than its uncompressed counterpart. To speed things up the de/compression algorithms are implemented in C++ via Rcpp.

Iterative compression algorithm

Iterative decompression algorithm



chiefBiiko/bitjson documentation built on May 20, 2019, 7 p.m.