knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
h5lite is designed to seamlessly map R's diverse data structures to HDF5's portable format. This vignette explains the supported R data types, how h5lite writes them to HDF5, and how you can precisely control data types and compression when needed.
library(h5lite) file <- tempfile(fileext = ".h5")
h5lite supports reading and writing a wide range of R data types. The table below lists the default mapping when writing to HDF5.
| R Data Type | HDF5 Equivalent | Description |
| :------------- | :--------------- | :--------------------------------------------- |
| Numeric | variable | Selects optimal type: uint8, float32, etc. |
| Logical | H5T_STD_U8LE | Stored as 0 (FALSE) or 1 (TRUE) (uint8). |
| Character | H5T_STRING | Variable or fixed-length UTF-8 strings. |
| Complex | H5T_COMPLEX | Native HDF5 2.0+ complex numbers. |
| Raw | H5T_OPAQUE | Raw bytes / binary data. |
| Factor | H5T_ENUM | Integer indices with label mapping. |
| integer64 | H5T_STD_I64LE | 64-bit signed integers via bit64 package. |
| POSIXt | H5T_STRING | ISO 8601 string (YYYY-MM-DDTHH:MM:SSZ). |
| List | H5O_TYPE_GROUP | Recursive container structure. |
| Data Frame | H5T_COMPOUND | Table of mixed types. |
| NULL | H5S_NULL | Creates a placeholder. |
Atomic data types (Integer, integer64, Double, Logical, Character, Complex, Raw, and POSIXt) can be written to HDF5 as scalars, 1D vectors, or N-dimensional arrays.
I().dim attributes are written as N-dimensional datasets, preserving their shape.# 1. Scalar (0 dims) h5_write(I(42), file, "structure/scalar") # 2. Vector (1 dim) h5_write(c(1, 2, 3), file, "structure/vector") # 3. Matrix (2 dims) h5_write(matrix(1:9, 3, 3), file, "structure/matrix")
For more complex dimensional structures, refer to vignette('matrices').
R uses 32-bit integers and 64-bit doubles. When writing with as = "auto", h5lite analyzes the range of your data to select the most compact HDF5 type.
float64 (H5T_IEEE_F64LE)float64.int[8|16|32|64], uint[8|16|32|64], float[16|32|64], or bfloat16.# Integers between 0 and 255 (uint8) h5_write(c(1L, 2L, 3L), file, "integers/small") # Integers with NA -> float64 h5_write(c(1L, NA, 3L), file, "integers/with_na") # Force larger type (int16) h5_write(1:100, file, "integers/short", as = "int16")
integer64)int64 (H5T_STD_I64LE)R does not natively support 64-bit integers, but h5lite supports reading and writing them via the bit64 package.
if (requireNamespace("bit64", quietly = TRUE)) { val <- bit64::as.integer64(c("9223372036854775807", "-9223372036854775807")) h5_write(val, file, "integers/int64") }
R's default numeric type is double-precision.
float64 (H5T_IEEE_F64LE)int[8|16|32|64], uint[8|16|32|64], float[16|32|64], or bfloat16data <- rnorm(10) # Default (float64) h5_write(data, file, "doubles/default") # Single Precision (float32) - Saves 50% space h5_write(data, file, "doubles/float32", as = "float32")
uint8 (H5T_STD_U8LE)float64 (H5T_IEEE_F64LE)int[8|16|32|64], uint[8|16|32|64], float[16|32|64], or bfloat16bools <- sample(c(TRUE, FALSE), 1000, replace = TRUE) h5_write(bools, file, "logicals/packed")
HDF5 supports two methods for storing strings. By default (as = "auto"), h5lite chooses the best approach:
NA or if string lengths are highly inconsistent.NA to allow for compression.Explicitly requested with as = "utf8" or as = "ascii".
NA: YES# UTF-8 variable length h5_write(c("apple", "banana", NA), file, "strings/var_utf8") # ASCII variable length h5_write(c("A", "B", "C"), file, "strings/var_ascii", as = "ascii")
Use as = "ascii[10]" / as = "utf8[10]" (explicit size=10) or as = "ascii[]" / as = "utf8[]" (auto-detect max length).
NA: NO# UTF-8 auto-detected fixed length h5_write(c("apple", "banana"), file, "strings/fixed_utf8") # ASCII fixed length (1 byte) h5_write(c("A", "B", "C"), file, "strings/fixed_ascii", as = "ascii[1]")
Technical Note:
h5liteusesH5T_C_S1for all strings, andH5T_STR_NULLTERMfor all fixed length strings.
POSIXt)R date-time objects (POSIXct / POSIXlt) are stored as Strings in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ). This ensures maximum portability with other languages and HDF5 tools that do not share R's specific epoch-based integer representation.
now <- Sys.time() h5_write(now, file, "datetime/iso8601")
R complex numbers are written using the new complex floating-point type introduced in HDF5 2.0.0 (H5T_COMPLEX_IEEE_F64LE).
Compatibility Warning: This data type for complex numbers is a feature specific to HDF5 version 2.0+. Datasets written with this type generally cannot be read by HDF5 readers built against older versions of the library (e.g., HDF5 1.10 or 1.12). Ensure that any downstream tools or libraries used to read these files are updated to support HDF5 2.0 standards.
comp <- c(1+2i, 3+4i) h5_write(comp, file, "complex_data")
Raw vectors (bytes) are stored as HDF5 OPAQUE types. This is ideal for storing binary blobs, images, or serialized objects where you need to preserve the exact byte sequence without interpretation.
raw_vec <- as.raw(c(0x01, 0xFF, 0x1A)) h5_write(raw_vec, file, "binary_blob")
R Factors are stored as HDF5 ENUM types. This maps the integer codes to the factor levels (labels) efficiently within the file header, ensuring the labels are preserved without duplicating string data for every element.
fac <- factor(c("low", "high", "medium", "low")) h5_write(fac, file, "categorical")
R lists are mapped to HDF5 Groups. Since lists are recursive containers, h5lite walks the list and creates a dataset (or subgroup) for every element found. You can use as = c("element_name" = "skip") to exclude specific items.
my_list <- list(data = 1:100, meta = list(valid = TRUE)) h5_write(my_list, file, "types/list")
Data Frames are stored as HDF5 Compound types (tables). This ensures that rows are kept together in memory. You can use the as argument to specify the type of individual columns.
For a comprehensive guide, see vignette('data-frames').
df <- data.frame( id = 1:5, score = c(10.5, 20.2, 15.0, 9.8, 30.1) ) # 1. 'id' coerced to uint16 # 2. 'score' coerced to float32 h5_write(df, file, "types/dataframe", as = c( "id" = "uint16", "score" = "float32" ))
The NULL object in R is mapped to a dataset with a NULL Dataspace (H5S_NULL). This creates a dataset that exists in the file structure but contains no data elements and consumes no storage space.
h5_write(NULL, file, "placeholders/empty_slot")
HDF5 supports transparent data compression using the zlib (gzip) and szip algorithms. You can control the compression behavior using the compress argument.
"gzip-5" (default): Standard zlib compression at level 5. Levels "gzip-1" through "gzip-9" are also supported. Safe and universally compatible."szip-nn": Szip with Nearest Neighbor coding. Best for continuous, correlated, or floating-point data (e.g., time series or smooth gradients)."szip-ec": Szip with Entropy Coding. Best for uncorrelated, discrete, or categorical integer data."none": Disables compression entirely.# Maximum zlib compression h5_write(rnorm(1000), file, "data/max", compress = "gzip-9") # Szip Entropy Coding for discrete integer data h5_write(sample(1:5, 1000, replace = TRUE), file, "data/szip", compress = "szip-ec")
When gzip compression is enabled, h5lite automatically applies the HDF5 Byte Shuffle Filter before the data is compressed. The Shuffle Filter does not compress data itself; rather, it rearranges the byte stream to make it more compressible by zlib.
It works by separating the bytes of each value by their significance. For example, in a 4-byte integer array:
Why this helps:
int32 data to compress nearly as well as int8 data if the values are small.unlink(file)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.