H5File-class: H5File objects

H5File-classR Documentation

H5File objects

Description

The H5File class provides a formal representation of an HDF5 file (local or remote).

Usage

## Constructor function:
H5File(filepath, s3=FALSE, s3credentials=NULL, .no_rhdf5_h5id=FALSE)

Arguments

filepath

A single string specifying the path or URL to an HDF5 file.

s3

TRUE or FALSE. Should the filepath argument be treated as the URL to a file stored in an Amazon S3 bucket, rather than the path to a local file?

s3credentials

A list of length 3, providing the credentials for accessing files stored in a private Amazon S3 bucket. See ?H5Pset_fapl_ros3 in the rhdf5 package for more information.

.no_rhdf5_h5id

For internal use only. Don't use.

Details

IMPORTANT NOTE ABOUT H5File OBJECTS AND PARALLEL EVALUATION

The short story is that H5File objects cannot be used in the context of parallel evaluation at the moment.

Here is why:

H5File objects contain an identifier to an open connection to the HDF5 file. This identifier becomes invalid in the 2 following situations:

  • After serialization/deserialization, that is, after loading a serialized H5File object with readRDS() or load().

  • In the context of parallel evaluation, when using the SnowParam parallelization backend. This is because, unlike the MulticoreParam backend which used a system fork, the SnowParam backend uses serialization/deserialization to transmit the object to the workers.

In both cases, the connection to the file is lost and any attempt to read data from the H5File object will fail. Note that the above also happens to any H5File object that got serialized indirectly i.e. as part of a bigger object. For example, if an HDF5Array object was constructed from an H5File object, then it contains the H5File object and therefore blockApply(..., BPPARAM=SnowParam(4)) cannot be used on it.

Furthermore, even if sometimes an H5File object seems to work fine with the MulticoreParam parallelization backend, this is highly unreliable and must be avoided.

Value

An H5File object.

See Also

  • H5Pset_fapl_ros3 in the rhdf5 package for detailed information about how to pass your S3 credentials to the s3credentials argument.

  • The HDF5Array class for representing and operating on a conventional (a.k.a. dense) HDF5 dataset.

  • The H5SparseMatrix class for representing and operating on an HDF5 sparse matrix.

  • The H5ADMatrix class for representing and operating on the central matrix of an h5ad file, or any matrix in its /layers group.

  • The TENxMatrix class for representing and operating on a 10x Genomics dataset.

  • The h5mread function in this package (HDF5Array) that is used internally by HDF5Array, TENxMatrix, and H5ADMatrix objects, for (almost) all their data reading needs.

  • h5ls to list the content of an HDF5 file.

  • bplapply, MulticoreParam, and SnowParam, in the BiocParallel package.

Examples

## ---------------------------------------------------------------------
## A. BASIC USAGE
## ---------------------------------------------------------------------

## With a local file:
toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")
h5file1 <- H5File(toy_h5)
h5ls(h5file1)
path(h5file1)

h5mread(h5file1, "M2", list(1:10, 1:6))
get_h5mread_returned_type(h5file1, "M2")

## With a file stored in an Amazon S3 bucket:
if (Sys.info()[["sysname"]] != "Darwin") {
  public_S3_url <-
   "https://rhdf5-public.s3.eu-central-1.amazonaws.com/rhdf5ex_t_float_3d.h5"
  h5file2 <- H5File(public_S3_url, s3=TRUE)
  h5ls(h5file2)

  h5mread(h5file2, "a1")
  get_h5mread_returned_type(h5file2, "a1")
}

## ---------------------------------------------------------------------
## B. H5File OBJECTS AND PARALLEL EVALUATION
## ---------------------------------------------------------------------
## H5File objects cannot be used in the context of parallel evaluation
## at the moment!

library(BiocParallel)

FUN1 <- function(i, h5file, name)
    sum(HDF5Array::h5mread(h5file, name, list(i, NULL)))

FUN2 <- function(i, h5file, name)
    sum(HDF5Array::h5mread(h5file, name, list(i, NULL, NULL)))

## With the SnowParam parallelization backend, the H5File object
## does NOT work on the workers:
## Not run: 
## ERROR!
res1 <- bplapply(1:150, FUN1, h5file1, "M2", BPPARAM=SnowParam(3))
## ERROR!
res2 <- bplapply(1:5, FUN2, h5file2, "a1", BPPARAM=SnowParam(3))

## End(Not run)

## With the MulticoreParam parallelization backend, the H5File object
## might seem to work on the workers. However this is highly unreliable
## and must be avoided:
## Not run: 
if (.Platform$OS.type != "windows") {
  ## UNRELIABLE!
  res1 <- bplapply(1:150, FUN1, h5file1, "M2", BPPARAM=MulticoreParam(3))
  ## UNRELIABLE!
  res2 <- bplapply(1:5, FUN2, h5file2, "a1", BPPARAM=MulticoreParam(3))
}

## End(Not run)

Bioconductor/HDF5Array documentation built on Oct. 31, 2024, 9:16 a.m.