knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
Atomic vectors are the fundamental data structure in R. They include numeric (integer and double), logical, character, complex, and raw vectors. This vignette explains how h5lite maps these R types to HDF5 datasets and provides guidance on controlling storage types and compression.
library(h5lite) file <- tempfile(fileext = ".h5")
Writing a vector to HDF5 is straightforward using h5_write(). The package automatically creates the necessary dataset and handles dimensions.
# Write a numeric vector vec <- c(1.5, 2.3, 4.2, 5.1) h5_write(vec, file, "data/numeric_vector") # Read it back res <- h5_read(file, "data/numeric_vector") print(res)
In R, a "scalar" is simply a vector of length 1. However, HDF5 distinguishes between a Scalar Dataspace (a single value with no dimensions) and a Simple Dataspace (an array) with dimensions [1].
By default, h5lite treats length-1 vectors as 1D arrays to maintain consistency with R's vector behavior. To write a true HDF5 scalar, you must wrap the value in I().
# 1. Default: 1D Array (Length 1) h5_write(42, file, "structure/array_1d") # 2. Explicit Scalar: Wrapped in I() h5_write(I(42), file, "structure/scalar") h5_str(file, "structure")
Note: When reading data back into R, both storage formats appear as standard R vectors of length 1.
h5lite attempts to map R types to the most efficient HDF5 equivalents automatically (as = "auto").
h5lite analyzes the range of your data and picks the smallest fitting HDF5 type (e.g., uint8, int16, int32, float64).h5lite maps these to uint8 (0 or 1) in HDF5 to save space.A key challenge in HDF5 is that standard integer and boolean types do not have a native representation for NA (missing values).
To ensure data safety, h5lite performs the following check:
NA, it is automatically promoted to float64.NA values are stored as an NaN variant in the file.h5_read() restores them as numeric vectors with NA.# Integer vector with NO missing values -> Automatic optimal type (uint8) h5_write(c(1L, 2L, 3L), file, "safe/ints") h5_typeof(file, "safe/ints") # Integer vector WITH missing values -> Promoted to float64 h5_write(c(1L, NA, 3L), file, "safe/ints_na") h5_typeof(file, "safe/ints_na")
If you know your data range fits into a smaller type (e.g., int8, uint16), you can use the as argument to force a specific storage type.
Warning: If you force an integer type on data containing NA or values outside the integer type's range then h5lite will throw an error.
# Store small integers as 8-bit signed integers h5_write(c(10, -5, 100), file, "small_ints", as = "int8") # Store logicals as 8-bit unsigned integers h5_write(c(TRUE, FALSE), file, "bools", as = "uint8")
HDF5 supports two primary methods for storing strings: Variable-Length and Fixed-Length.
By default (as = "auto"), h5lite chooses the most efficient string representation:
NA, it uses Variable-Length UTF-8 (which natively supports missing values).You can explicitly request variable-length storage using as = "utf8" or as = "ascii".
NA (stored as NULL pointers).# Variable length strings (handles NA) h5_write(c("apple", "banana", NA), file, "strings/var")
You can force fixed-length storage using the syntax [n], where n is the number of bytes.
n; pads shorter strings; does not support NA.# Fixed length strings (10 bytes per string) h5_write(c("A", "B", "C"), file, "strings/fixed", as = "ascii[10]") # Auto-detect max length (converts to fixed length based on longest string) h5_write(c("short", "longer", "longest"), file, "strings/auto_fixed", as = "ascii[]")
Compression in HDF5 requires the dataset to be "chunked". h5lite handles chunking parameters automatically when you enable compression.
You can configure compression using the compress argument:
"gzip-5" (default): Standard zlib compression at level 5. Levels "gzip-1" through "gzip-9" are also supported. Safe and universally compatible."szip-nn": Szip with Nearest Neighbor coding. Best for continuous, correlated, or floating-point data (e.g., time series or smooth gradients)."szip-ec": Szip with Entropy Coding. Best for uncorrelated, discrete, or categorical integer data."none": Disables compression entirely.# Write a large vector with max zlib compression x <- rep(rnorm(100), 100) h5_write(x, file, "compressed_data", compress = "gzip-9") # Write a smooth, correlated dataset using szip Nearest Neighbor smooth_data <- sin(seq(0, 10, length.out = 1000)) h5_write(smooth_data, file, "szip_data", compress = "szip-nn")
R does not natively support 64-bit integers, but the bit64 package provides an integer64 class. h5lite supports reading and writing these types directly to HDF5 int64.
if (requireNamespace("bit64", quietly = TRUE)) { val <- bit64::as.integer64(c("9223372036854775807", "-9223372036854775807")) h5_write(val, file, "huge_ints") h5_typeof(file, "huge_ints") in_val <- h5_read(file, "huge_ints", as = "bit64") print(class(in_val)) }
unlink(file)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.