H5File-class | R Documentation |
The H5File class provides a formal representation of an HDF5 file (local or remote).
## Constructor function:
H5File(filepath, s3=FALSE, s3credentials=NULL, .no_rhdf5_h5id=FALSE)
filepath |
A single string specifying the path or URL to an HDF5 file. |
s3 |
|
s3credentials |
A list of length 3, providing the credentials for accessing
files stored in a private Amazon S3 bucket.
See |
.no_rhdf5_h5id |
For internal use only. Don't use. |
IMPORTANT NOTE ABOUT H5File OBJECTS AND PARALLEL EVALUATION
The short story is that H5File objects cannot be used in the context of parallel evaluation at the moment.
Here is why:
H5File objects contain an identifier to an open connection to the HDF5 file. This identifier becomes invalid in the 2 following situations:
After serialization/deserialization, that is, after loading
a serialized H5File object with readRDS()
or
load()
.
In the context of parallel evaluation, when using the
SnowParam parallelization backend.
This is because, unlike the MulticoreParam
backend which used a system fork
, the
SnowParam backend uses
serialization/deserialization to transmit the object
to the workers.
In both cases, the connection to the file is lost and any attempt to
read data from the H5File object will fail. Note that the above also
happens to any H5File object that got serialized indirectly i.e. as
part of a bigger object. For example, if an HDF5Array object
was constructed from an H5File object, then it contains the H5File
object and therefore
blockApply(..., BPPARAM=SnowParam(4))
cannot be used on it.
Furthermore, even if sometimes an H5File object seems to work fine with the MulticoreParam parallelization backend, this is highly unreliable and must be avoided.
An H5File object.
H5Pset_fapl_ros3 in the rhdf5 package for
detailed information about how to pass your S3 credentials
to the s3credentials
argument.
The HDF5Array class for representing and operating on a conventional (a.k.a. dense) HDF5 dataset.
The H5SparseMatrix class for representing and operating on an HDF5 sparse matrix.
The H5ADMatrix class for representing and operating on
the central matrix of an h5ad
file, or any matrix in
its /layers
group.
The TENxMatrix class for representing and operating on a 10x Genomics dataset.
The h5mread
function in this package (HDF5Array)
that is used internally by HDF5Array, TENxMatrix,
and H5ADMatrix objects, for (almost) all their data
reading needs.
h5ls
to list the content of an HDF5 file.
bplapply
,
MulticoreParam
, and
SnowParam
, in the
BiocParallel package.
## ---------------------------------------------------------------------
## A. BASIC USAGE
## ---------------------------------------------------------------------
## With a local file:
toy_h5 <- system.file("extdata", "toy.h5", package="HDF5Array")
h5file1 <- H5File(toy_h5)
h5ls(h5file1)
path(h5file1)
h5mread(h5file1, "M2", list(1:10, 1:6))
get_h5mread_returned_type(h5file1, "M2")
## With a file stored in an Amazon S3 bucket:
if (Sys.info()[["sysname"]] != "Darwin") {
public_S3_url <-
"https://rhdf5-public.s3.eu-central-1.amazonaws.com/rhdf5ex_t_float_3d.h5"
h5file2 <- H5File(public_S3_url, s3=TRUE)
h5ls(h5file2)
h5mread(h5file2, "a1")
get_h5mread_returned_type(h5file2, "a1")
}
## ---------------------------------------------------------------------
## B. H5File OBJECTS AND PARALLEL EVALUATION
## ---------------------------------------------------------------------
## H5File objects cannot be used in the context of parallel evaluation
## at the moment!
library(BiocParallel)
FUN1 <- function(i, h5file, name)
sum(HDF5Array::h5mread(h5file, name, list(i, NULL)))
FUN2 <- function(i, h5file, name)
sum(HDF5Array::h5mread(h5file, name, list(i, NULL, NULL)))
## With the SnowParam parallelization backend, the H5File object
## does NOT work on the workers:
## Not run:
## ERROR!
res1 <- bplapply(1:150, FUN1, h5file1, "M2", BPPARAM=SnowParam(3))
## ERROR!
res2 <- bplapply(1:5, FUN2, h5file2, "a1", BPPARAM=SnowParam(3))
## End(Not run)
## With the MulticoreParam parallelization backend, the H5File object
## might seem to work on the workers. However this is highly unreliable
## and must be avoided:
## Not run:
if (.Platform$OS.type != "windows") {
## UNRELIABLE!
res1 <- bplapply(1:150, FUN1, h5file1, "M2", BPPARAM=MulticoreParam(3))
## UNRELIABLE!
res2 <- bplapply(1:5, FUN2, h5file2, "a1", BPPARAM=MulticoreParam(3))
}
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.