writehdf5: Write structured list to an HDF5 file

View source: R/writehdf5.R

writehdf5R Documentation

Write structured list to an HDF5 file

Description

Writes a named list, including its hierarchy of nested sublists, to an HDF5 file with HDF5 groups, subgroups and datasets preserving the hierarchical structure. The routine supports standard HDF5 data types, including double-precision floating point numbers (64 bit), integers (32 bit), characters (8 bit), and booleans, as well as vectors and arrays of these types. It also supports 64-bit integers (H5T_NATIVE_INT64), available in R via the bit64 package. Custom attributes of sublists and data, assigned in R via attr, are automatically transcribed to group and dataset attributes in the HDF5 file. Such attributes can also be provided for empty groups (produced by list()) and datasets (produced by numeric(0)).

Usage

writehdf5(obj, file, inherent.attributes = FALSE, level = 6, overwrite = TRUE)

Arguments

obj

List containing the data to be written. Nested sublists are interpreted as sub-groups within the HDF5 file. If a sublist is empty or if an element inside a list is 'NULL', it creates an empty group/dataset that can hold only attributes.

file

Character string specifying the file name of the output HDF5 file.

inherent.attributes

Logical flag indicating whether to include inherent attributes that some R-objects possess, such as 'dim' for matrices or 'names' for named vectors and arrays.

level

Integer specifying the compression level for datasets, typically between 0 (no compression) and 9 (maximum compression). Not all dataset types support compression.

overwrite

Logical value indicating whether to overwrite an existing file.

Details

The function relies on the hdf5r package and on the bit64 package.

The nested list obj should only contain data types available to HDF5. The only exception are data frames, which, if included, are automatically converted to lists. Normally, the list obj and its nested sublists, should all be named lists, i.e. list(a=1, b=2) rather than list(1,2). Unnamed elements are automatically assigned a name 'unnamed_#' in the HDF5 file.

Some data types in R have inherent attributes, such as 'names' for data frames and 'dim' for arrays. By default, these inherent attributes are not written to the HDF5 file. They are, however, automatically regenerated when the HDF5 file is loaded back via readhdf5. The argument inherent.attributes can be used to force writing all attributes, including the inherent ones, to the HDF5 file.

If a structured R list is saved with writehdf5 and then reloaded using readhdf5, the recovered list is identical to the input list up to the ordering of list elements, which is alphabetic by default in HDF5. The only other difference between the input and recovered data occurs when the input data contain data frames, as these are automatically converted do lists. A workaround is to set inherent.attributes=TRUE when writing the HDF5 file. In this case, the reloaded HDF5 data becomes a data frame again.

Value

None

See Also

readhdf5

Examples


# Create example data
input = list(
  group_empty = list(),
  dataset_empty = numeric(0),
  group_dataframe = data.frame(
    ID = as.integer(1:3),
    Name = c("Alice", "Bob", "Charlie"),
    Salary = c(2341.2, 3534.2, 4541.9),
    Employed = c(TRUE, TRUE, FALSE)
  ),
  dataset_parameters = 1.234,
  group_integers = list(int32 = as.integer(123),
                        int64 = bit64::as.integer64(123),
                        vector = bit64::as.integer64((1:3)+12345678912345)),
  dataset_nonnumeric = c(NA, NaN, Inf, -Inf),
  group_mixed = list(
    header = 'test header',
    subgroup1 = list(
      dataset1 = c("A", "%}~&^", "x1y2z3", "", " "),
      dataset2 = matrix(as.integer(1:10), nrow = 2),
      dataset3 = array(runif(30), dim = c(2, 5, 3))
    ),
    subgroup2 = list(date = as.character(as.Date("2025-01-01")),
                     location = 'Perth')
  )
)

# Add attributes to some datasets
attr(input$dataset_empty,'Comment') = 'This is a test file.'
attr(input$group_mixed$subgroup1$dataset3,'Type') = '3D array'
attr(input$group_integers$vector,'Comment') = 'Vector of long integers'
attr(input$group_integers$vector,'Package') = 'bit64'

# Add attributes to some groups
attr(input$group_dataframe,'Company branch') = 'Sales'
attr(input$group_integers,'Comment') = 'Testing different integers'
attr(input$group_empty,'Timestamp') = date()
attr(input$group_empty,'Working directory') = getwd()

# Write list to HDF5 file
filename = tempfile()
writehdf5(input, filename)

# Read HDF5 file into a new list
output = readhdf5(filename)

# Check if input and output lists are identical
# (up to alphabetic ordering and data frame-to-list conversion)
print(all.equal(sortlist(input, convert.data.frames = TRUE), output))

# Write list to HDF5 file again, this time with inherent attributes, allowing
# to keep track of the data frames
filename = tempfile()
writehdf5(input, filename, inherent.attributes = TRUE)

# Read HDF5 file into a new list
output = readhdf5(filename)

# Check if input and output lists are identical
# (up to alphabetic ordering only)
print(all.equal(sortlist(input), output))


obreschkow/cooltools documentation built on Nov. 16, 2024, 2:46 a.m.