hdf5_import: Import data from file or URL into HDF5 format

View source: R/HDF5Matrix_import.R

hdf5_importR Documentation

Import data from file or URL into HDF5 format

Description

Modern wrapper for importing CSV, TSV, or other delimited text files into HDF5 format. Returns an HDF5Matrix object ready for use.

Usage

hdf5_import(
  source,
  filename,
  dataset,
  sep = NULL,
  header = TRUE,
  rownames = FALSE,
  overwrite = FALSE,
  parallel = TRUE,
  threads = NULL
)

Arguments

source

Character. Path to local file or URL to import. Supports compressed files (.gz, .tar.gz, .zip, .bz2).

filename

Character. Path to HDF5 output file (created if doesn't exist).

dataset

Character. Full dataset path (e.g., "data/imported" or "group/dataset").

sep

Character. Field separator. Default NULL (auto-detect from extension: "," for .csv, "\t" for .tsv, "\t" otherwise).

header

Logical or character vector. If TRUE, first row contains column names. If character vector, use these as column names. Default TRUE.

rownames

Logical or character vector. If TRUE, first column contains row names. If character vector, use these as row names. Default FALSE.

overwrite

Logical. If TRUE, overwrite dataset if exists. Default FALSE.

parallel

Logical. Use parallel processing for import. Default TRUE.

threads

Integer. Number of threads for parallel processing. Default NULL (uses all available cores).

Details

This function is a modern, user-friendly wrapper around bdImportData_hdf5 and bdImportTextFile_hdf5. It:

  • Automatically detects file format from extension

  • Handles compressed files (.gz, .tar.gz, .zip)

  • Downloads from URLs automatically

  • Returns ready-to-use HDF5Matrix object

  • Uses sensible defaults for most use cases

Supported formats:

  • CSV files (.csv) - comma-separated

  • TSV files (.tsv, .txt) - tab-separated

  • Compressed files (.gz, .tar.gz, .zip, .bz2)

  • Remote files (http://, https://, ftp://)

Memory efficiency: Import is done in a streaming fashion, so very large files can be imported without loading them entirely into memory.

Value

HDF5Matrix object pointing to the imported data.

See Also

bdImportData_hdf5 for the underlying implementation, hdf5_create_matrix for creating matrices from R objects

Examples


csv_file  <- tempfile(fileext = ".csv")
hdf5_file <- tempfile(fileext = ".h5")

# Write sample numeric data
write.table(matrix(rnorm(50), nrow = 10, ncol = 5),
            csv_file, sep = ",", row.names = FALSE, col.names = TRUE)

# Import CSV to HDF5
mat <- hdf5_import(
  source   = csv_file,
  filename = hdf5_file,
  dataset  = "raw/data",
  sep      = ","
)
dim(mat)

hdf5_close_all()
unlink(c(csv_file, hdf5_file))



BigDataStatMeth documentation built on May 15, 2026, 1:07 a.m.