knitr::opts_chunk$set(echo = TRUE)
library(rhdf5client2)

HSDSSource

An object of type HSDSSource is a HDFGroup server running on a machine. The constructor requires the endpoint and server type, either h5serv (for the Tornado-based HDF Server) or hsds (for the HDF Scalable Data Service). If the type is not specified, the server will be assumed to be hsds

src.hsds <- HSDSSource('http://hsdshdflab.hdfgroup.org')
src.chan <- HSDSSource('http://h5s.channingremotedata.org:5000', 'h5serv')

The routine domains is provided for inspection of the server hierarchy. This is the hierarchy that maps approximately to the directory structure of the server file system. The purpose of this routine is to assist the user in locating HDF5 files.

The user needs to know the root domain of the server. This information should be published with the endpoint. For an h5serv server, the default hdfgroup.org is a good guess, because this value is preconfigured and there is not often a reason to wish to change it.

domains(src.chan)
domains(src.chan, 'public/hdfgroup/org')

For an hsds server, the root domain could be anything, so the administrator has to publish it.

domains(src.hsds, '/home/jreadey')
domains(src.hsds, '/home/jreadey/HDFLabTutorial')

HSDSFile

An object of class HSDSFile represents a HDF5 file. The object is constructed by providing a source and a file domain. (TODO: if the domain is not a real file domain, this crashes with an unhelpful error message.)

f0 <- HSDSFile(src.hsds, '/home/spollack/testzero.h5')
f1 <- HSDSFile(src.chan, 'tenx_100k_sorted.h5s.channingremotedata.org')

The function listDatasets lists the datasets in a file.

listDatasets(f0)
listDatasets(f1)

HSDSDataset

Construct a HSDSDataset object from a HSDSFile and a dataset path.

d0 <- HSDSDataset(f0, '/grpA/grpAB/dsetX')
d1 <- HSDSDataset(f1, '/assay001')

Data Fetch (1)

The fundamental data retrieval method is getData. Its argument is a vector of slices of type character. Valid slices are : (all indices), 1:10 (indices 1 through 10 inclusive), :10 (same as 1:10), 5: (from 5 to the maximum value of the index) and 2:14:4 (from 2 to 14 inclusive in increments of 4.)

Note that the slice should be passed in R semantics: 1 signifies the first element, and the last element is included in the slice. (Internally, rhdf5client2 converts to Python semantics, in which the first index is 0 and the last element is excluded. But here, as everywhere in the package, all Python details should be hidden from the user.)

apply(getData(d1, c('1:4', '1:27998'), transfermode='JSON'), 1, sum)
apply(getData(d1, c('1:4', '1:27998'), transfermode='binary'), 1, sum)

Data Fetch (2)

getData is generic. It can also be passed a list of vectors for the index argument, one vector in each dimension. At present, it only works if each of the vectors can be expressed as a single slice. Eventually, this functionality will be expanded to the general multi-dimensional case of multiple slices. In the general case, multiple array blocks will be fetched and bound back together into a single array.

apply(getData(d1, list(1:4, 1:27998), transfermode='JSON'), 1, sum)
apply(getData(d1, list(1:4, 1:27998), transfermode='binary'), 1, sum)

Data Fetch (3)

The [ operator is provided for the two most typical cases (one-dimensional and two-dimensional numeric data.)

apply(d1[1:4, 1:27998], 1, sum)


sampoll/rhdf5client2 documentation built on May 24, 2019, 2:47 p.m.