Data Frames

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

Data frames are the workhorse of data analysis in R. In HDF5, data frames are stored as Compound Datasets. This allows different columns to have different data types (e.g., integer, float, string) within the same dataset, much like a SQL table.

This vignette explains how h5lite handles data frames, including row names, factors, and missing values.

library(h5lite)
file <- tempfile(fileext = ".h5")

Basic Usage

Writing a data frame is as simple as writing any other object. h5lite automatically maps each column to its appropriate HDF5 type.

# Create a standard data frame
df <- data.frame(
  id = 1:5,
  group = c("A", "A", "B", "B", "C"),
  score = c(10.5, 9.2, 8.4, 7.1, 6.0),
  passed = c(TRUE, TRUE, TRUE, FALSE, FALSE),
  stringsAsFactors = FALSE
)

# Write to HDF5
h5_write(df, file, "study_data/results")

# Fetch the column names
h5_names(file, "study_data/results")

# Read back
df_in <- h5_read(file, "study_data/results")

head(df_in)

Customizing Column Types

You can use the as argument to control the storage type for specific columns. This is passed as a named vector where the names correspond to the column names.

This is particularly useful for optimizing storage (e.g., saving space by storing small integers as int8 or single characters as ascii[1]).

df_small <- data.frame(
  id   = 1:10,
  code = rep("A", 10)
)

# Force 'id' to be uint16 and 'code' to be an ascii string
h5_write(df_small, file, "custom_df", 
         as = c(id = "uint16", code = "ascii[]"))

Row Names

Standard HDF5 Compound Datasets do not have a concept of "row names". However, h5lite preserves them using Dimension Scales.

When you write a data frame with row names, h5lite creates a separate dataset (usually named _rownames) and links it to the main table. When reading, h5lite automatically restores these as the row.names of the data frame.

mtcars_subset <- head(mtcars, 3)

h5_write(mtcars_subset, file, "cars")

h5_str(file)

# Read back
result <- h5_read(file, "cars")
print(row.names(result))
unlink(file)


Try the h5lite package in your browser

Any scripts or data that you put into this service are public.

h5lite documentation built on May 19, 2026, 1:07 a.m.