knitr::opts_chunk$set(echo = TRUE) library(rhdf5client2)
An object of type HSDSSource is a HDFGroup server running on a machine. The
constructor requires the endpoint and server type, either h5serv
(for
the Tornado-based HDF Server) or hsds
(for the HDF Scalable Data Service).
If the type is not specified, the server will be assumed to be hsds
src.hsds <- HSDSSource('http://hsdshdflab.hdfgroup.org') src.chan <- HSDSSource('http://h5s.channingremotedata.org:5000', 'h5serv')
The routine domains
is provided for inspection of the server hierarchy.
This is the hierarchy that maps approximately to the directory structure of
the server file system. The purpose of this routine is to assist the user
in locating HDF5 files.
The user needs to know the root domain of the server. This information
should be published with the endpoint. For an h5serv server, the default
hdfgroup.org
is a good guess, because this value is preconfigured and
there is not often a reason to wish to change it.
domains(src.chan) domains(src.chan, 'public/hdfgroup/org')
For an hsds server, the root domain could be anything, so the administrator has to publish it.
domains(src.hsds, '/home/jreadey') domains(src.hsds, '/home/jreadey/HDFLabTutorial')
An object of class HSDSFile represents a HDF5 file. The object is constructed by providing a source and a file domain. (TODO: if the domain is not a real file domain, this crashes with an unhelpful error message.)
f0 <- HSDSFile(src.hsds, '/home/spollack/testzero.h5') f1 <- HSDSFile(src.chan, 'tenx_100k_sorted.h5s.channingremotedata.org')
The function listDatasets
lists the datasets in a file.
listDatasets(f0) listDatasets(f1)
Construct a HSDSDataset object from a HSDSFile and a dataset path.
d0 <- HSDSDataset(f0, '/grpA/grpAB/dsetX') d1 <- HSDSDataset(f1, '/assay001')
The fundamental data retrieval method is getData
. Its argument is a
vector of slices of type character
. Valid slices are :
(all indices),
1:10
(indices 1 through 10 inclusive), :10
(same as 1:10
), 5:
(from 5 to the maximum value of the index) and 2:14:4
(from 2 to 14
inclusive in increments of 4.)
Note that the slice should be passed in R semantics: 1 signifies the first element, and the last element is included in the slice. (Internally, rhdf5client2 converts to Python semantics, in which the first index is 0 and the last element is excluded. But here, as everywhere in the package, all Python details should be hidden from the user.)
apply(getData(d1, c('1:4', '1:27998'), transfermode='JSON'), 1, sum) apply(getData(d1, c('1:4', '1:27998'), transfermode='binary'), 1, sum)
getData
is generic. It can also be passed a list of vectors for the index
argument, one vector in each dimension. At present, it only works if
each of the vectors can be expressed as a single slice. Eventually, this
functionality will be expanded to the general multi-dimensional case of
multiple slices. In the general case, multiple array blocks will be
fetched and bound back together into a single array.
apply(getData(d1, list(1:4, 1:27998), transfermode='JSON'), 1, sum) apply(getData(d1, list(1:4, 1:27998), transfermode='binary'), 1, sum)
The [
operator is provided for the two most typical cases (one-dimensional and two-dimensional numeric data.)
apply(d1[1:4, 1:27998], 1, sum)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.