knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "50%", fig.width = 5, fig.height = 4 )
When your input data is of type raw()
, it is assumed that it encodes a data type
where each element is typesize
bytes long. The data type can be any structured form
of data and does not necessarily needs to be known. Of course you need to know the
structure when you need to interpret the data, but that's not up to the Blosc compressor.
The example below compresses the raw()
data assuming that the data type is 2 bytes long.
library(blosc) data_input <- as.raw(c(1, 2, 3, 4, 1, 2, 3, 4)) blosc_compress(data_input, typesize = 2)
Note that the length of the resulting data is actually longer than that of the input data. This is because the compressor has an overhead. The data set is just too small compared to the overhead.
Can you compress other data types with Blosc? Yes you can. You first have
to encode it to a binary form (raw()
), with for instance r_to_dtype()
or
any other method that converts your data into a raw()
format. You can
also use the dtype
argument to encode and compress your data in one go.
In that case you need to specify an appropriate data type (vignette("dtypes")
).
The example below shows how to encode numeric()
values as
little-endian 16 bit floating point data ("<f2"
) and compresses it.
## The line below won't work as the default `typesize` (4) does ## not match with the dtype size (2) ## blosc_compress(iris$Petal.Length, dtype = "<f2") ## Explicitely set the `typesize` to 2 compressed_iris <- blosc_compress(iris$Petal.Length, typesize = 2, dtype = "<f2")
The output is always a vector of raw()
data. Generally, the output data should
be smaller than the input data. There are exceptions. One is seen above, where
the data set is too small in comparison with the compressor overhead. Another
case is where the data is just too random, where the compressor algorithm simply
can't compress the data. In the its compressed form, the data can no longer
be interpreted directly. You need to decompress it first (blosc_decompress()
).
You can pick from several algorithms to compress your data: "blosclz"
, "lz4"
,
"lz4hc"
, "zlib"
, or "zstd"
. There is not a single algorithm that always has
the best performance (speed and compression level). It really depends on your
data and can be tested by trial and error. You can also lower the compression level
argument if you prefer speed over compression level.
The decompression function (blosc_decompress()
) only accepts raw()
data that
has been compressed with Blosc. It doesn't have to be created in R
, it can
be generated with any software using the c-blosc library.
You don't have to specify the compression algorithm, typesize or anything else. All that information is embedded in the header of the raw input data. You can even retrieve this information with:
blosc_info( compressed_iris )
If you don't specify the output type, the decompression routine returns raw()
data. Do you remember the iris length data that we compressed earlier? We
can simply decompress it by calling blosc_decompress()
.
iris_length1 <- blosc_decompress(compressed_iris) head(iris_length1)
It works, but we got raw()
data as output. This is because the decompressor
knows little about the data structure of the decompressed data. Since we know
that we have encoded it as little-endian 16 bit floating point values ("<f2"
),
we can specify it as such. Once specified, the function will automatically decode
the data.
iris_length2 <- blosc_decompress(compressed_iris, dtype = "<f2") hist(iris_length2)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.