hdf5matrix_options: Set or get HDF5Matrix computation options

View source: R/options.R

hdf5matrix_optionsR Documentation

Set or get HDF5Matrix computation options

Description

Configure global settings for parallelization, block processing and compression in HDF5Matrix operations. These settings affect all HDF5Matrix computations unless explicitly overridden in individual method calls.

Usage

hdf5matrix_options(
  paral = NULL,
  block_size = NULL,
  threads = NULL,
  compression = NULL
)

Arguments

paral

Logical or NULL. Enable OpenMP parallelization?

  • TRUE: Force parallel execution

  • FALSE: Force serial execution

  • NULL: Let BigDataStatMeth auto-detect (default)

block_size

Integer or NULL. Number of elements per block for block-wise processing.

  • Integer > 0: Use this block size

  • NULL: Auto-calculate based on matrix dimensions (default)

threads

Integer or NULL. Number of OpenMP threads to use.

  • Integer > 0: Use this many threads

  • NULL: Use all available threads (default)

compression

Integer (0-9) or NULL. gzip compression level for created datasets.

  • 0: No compression (fastest, largest files)

  • 1-3: Light compression (fast, moderate savings)

  • 6: Balanced compression (default, 60-80\

  • 7-9: Maximum compression (slowest, best ratio)

  • NULL: Use built-in default of 6

Details

BigDataStatMeth achieves high performance through two key mechanisms:

Block-wise processing: Large matrices are processed in chunks that fit in memory. The block_size parameter controls chunk size. Smaller blocks use less memory but require more I/O operations. Larger blocks are faster but require more RAM.

OpenMP parallelization: Operations are distributed across CPU cores. The paral and threads parameters control this. Parallelization provides near-linear speedup for compute-intensive operations.

Compression: Datasets are created with gzip compression (level 6 by default). This reduces disk usage by 60-80\ For benchmarks or workflows where speed is critical, set compression = 0. For long-term storage or large datasets, keep the default.

Priority: Options set here serve as defaults. Individual method calls can override: A$multiply(B, paral = TRUE, threads = 4, block_size = 2000)

Recommendations:

  • For interactive analysis: Leave defaults (NULL) - auto-detect works well

  • For scripts/HPC: Set explicitly based on your hardware and data size

  • For huge datasets (>10GB): Reduce block_size to fit in RAM

  • For many-core systems: Set threads explicitly (auto may be too aggressive)

  • For benchmarks: Set compression = 0 to eliminate gzip overhead

Value

When called with arguments: invisibly returns a list of all current options. When called without arguments: returns a list of all current options.

Examples

# View current options
hdf5matrix_options()

# Enable parallelization with 8 threads
hdf5matrix_options(paral = TRUE, threads = 8)

# Set block size to 1000 elements
hdf5matrix_options(block_size = 1000)

# Disable compression for benchmarking
hdf5matrix_options(compression = 0)

# Reset to defaults
hdf5matrix_options(paral = NULL, threads = NULL, block_size = NULL, compression = NULL)



BigDataStatMeth documentation built on May 15, 2026, 1:07 a.m.