HDF5 server

HDF Server "extends the HDF5 data model to efficiently store large data objects (e.g. up to multi-TB data arrays) and access them over the web using a RESTful API." In this package, several data structures are introduced

We maintain, thanks to a grant from the National Cancer Institute, the server http://h5s.channingremotedata.org:5000/. Visit this URL to get a flavor of the server structure: datasets, groups, and datatypes are high-level elements to be manipulated to work with data values from the server.

A key application of the rhdf5client package is support for the r BiocStyle::Biocpkg("restfulSE") package that defines an interface between the SummarizedExperiment class and the HDF5 Server. The server provides content for assay() requests to RESTfulSummarizedExperiment instances.

Some details

Motivation

Extensive human and computational effort is expended on downloading and managing large genomic data at site of analysis. Interoperable formats that are accessible via generic operations like those in RESTful APIs may help to improve cost-effectiveness of genome-scale analyses.

In this report we examine the use of HDF5 server as a back end for assay data.

A modest server configured to deliver HDF5 content via a RESTful API has been prepared and is used in this vignette.

Executive summary

We want to provide rapid read-only access to array-like data. To do this, the hierarchy and additional formalities of the HDF5 server data architecture are exposed through R functions and related classes. Full details on the HDF5 server are available at the HDFgroup site.

suppressPackageStartupMessages({
library(rhdf5client)
})

The dsmeta function returns top-level groups and datasets.

library(rhdf5client)
bigec2 = H5S_source(URL_h5serv())
bigec2
dsmeta(bigec2)[1:2,]      # two groups
dsmeta(bigec2)[1,2][[1]]  # all dataset candidates in group 1

Hierarchy of server resources

Server

Given the URL of a server running HDF5 server, we create an instance of H5S_source:

mys = H5S_source(serverURL=URL_h5serv())
mys

Groups

The server identifies a collection of 'groups'. For the server we are working with, only one group, at the root, is of interest.

groups(mys)

Links for a group

There is a class to hold the link set for any group:

lks = links(mys,1)
lks

Dataset access

We use double-bracket subscripting to grab a reference to a dataset from an H5S source. A dataset must be two-dimensional, i.e., accessible with two subscripts.

dta = bigec2[["tenx_100k_sorted"]] 
dta

Data

Data are accessed by subscripting. The subscripts are arrays of increasing, positive integers.

x = dta[ 15:20, 1905:1906 ]
x

(Obsolete) The subscripts are colon-delimited character with initial, final and optional stride.

x = dta["15:20", "1904:1906"]
x


sampoll/rhdf5client documentation built on June 4, 2019, 6:56 p.m.