qsave: qsave

View source: R/RcppExports.R

qsaveR Documentation

qsave

Description

Saves (serializes) an object to disk.

Usage

qsave(x, file,
preset = "high", algorithm = "zstd", compress_level = 4L,
shuffle_control = 15L, check_hash=TRUE, nthreads = 1)

Arguments

x

The object to serialize.

file

The file name/path.

preset

One of "fast", "balanced", "high" (default), "archive", "uncompressed" or "custom". See section Presets for details.

algorithm

Ignored unless preset = "custom". Compression algorithm used: "lz4", "zstd", "lz4hc", "zstd_stream" or "uncompressed".

compress_level

Ignored unless preset = "custom". The compression level used.

For lz4, this number must be > 1 (higher is less compressed).

For zstd, a number between -50 to 22 (higher is more compressed). Due to the format of qs, there is very little benefit to compression levels > 5 or so.

shuffle_control

Ignored unless preset = "custom". An integer setting the use of byte shuffle compression. A value between 0 and 15 (default 15). See section Byte shuffling for details.

check_hash

Default TRUE, compute a hash which can be used to verify file integrity during serialization.

nthreads

Number of threads to use. Default 1.

Details

This function serializes and compresses R objects using block compression with the option of byte shuffling.

Value

The total number of bytes written to the file (returned invisibly).

Presets

There are lots of possible parameters. To simplify usage, there are four main presets that are performant over a large variety of data:

  • "fast" is a shortcut for algorithm = "lz4", compress_level = 100 and shuffle_control = 0.

  • "balanced" is a shortcut for algorithm = "lz4", compress_level = 1 and shuffle_control = 15.

  • "high" is a shortcut for algorithm = "zstd", compress_level = 4 and shuffle_control = 15.

  • "archive" is a shortcut for algorithm = "zstd_stream", compress_level = 14 and shuffle_control = 15. (zstd_stream is currently single-threaded only)

To gain more control over compression level and byte shuffling, set preset = "custom", in which case the individual parameters algorithm, compress_level and shuffle_control are actually regarded.

Byte shuffling

The parameter shuffle_control defines which numerical R object types are subject to byte shuffling. Generally speaking, the more ordered/sequential an object is (e.g., 1:1e7), the larger the potential benefit of byte shuffling. It is not uncommon to improve compression ratio or compression speed by several orders of magnitude. The more random an object is (e.g., rnorm(1e7)), the less potential benefit there is, even negative benefit is possible. Integer vectors almost always benefit from byte shuffling, whereas the results for numeric vectors are mixed. To control block shuffling, add +1 to the parameter for logical vectors, +2 for integer vectors, +4 for numeric vectors and/or +8 for complex vectors.

Examples

x <- data.frame(int = sample(1e3, replace=TRUE),
        num = rnorm(1e3),
        char = sample(starnames$`IAU Name`, 1e3, replace=TRUE),
         stringsAsFactors = FALSE)
myfile <- tempfile()
qsave(x, myfile)
x2 <- qread(myfile)
identical(x, x2) # returns true

# qs support multithreading
qsave(x, myfile, nthreads=2)
x2 <- qread(myfile, nthreads=2)
identical(x, x2) # returns true

# Other examples
z <- 1:1e7
myfile <- tempfile()
qsave(z, myfile)
z2 <- qread(myfile)
identical(z, z2) # returns true

w <- as.list(rnorm(1e6))
myfile <- tempfile()
qsave(w, myfile)
w2 <- qread(myfile)
identical(w, w2) # returns true

qs documentation built on Aug. 10, 2022, 1:16 a.m.