suppressPackageStartupMessages({
    library(MultiAssayExperiment)
    library(HDF5Array)
    library(SummarizedExperiment)
})

Integrating an HDF5 backend for MultiAssayExperiment

Dependencies

library(MultiAssayExperiment)
library(HDF5Array)
library(SummarizedExperiment)

HDF5Array::DelayedArray Constructor

The HDF5Array package provides an on-disk representation of large datasets without the need to load them into memory. Convenient lazy evaluation operations allow the user to manipulate such large data files based on metadata. The DelayedMatrix class in the DelayedArray package provides a way to connect to a large matrix that is stored on disk.

First, we create a small matrix for constructing the DelayedMatrix class.

smallMatrix <- matrix(rnorm(10e5), ncol = 20)

We add rownames and column names to the matrix object for compatibility with the MultiAssayExperiment representation.

rownames(smallMatrix) <- paste0("GENE", seq_len(nrow(smallMatrix)))
colnames(smallMatrix) <- paste0("SampleID", seq_len(ncol(smallMatrix)))

Here we use the DelayedArray constructor function to create a DelayedMatrix object.

smallMatrix <- DelayedArray(smallMatrix)
class(smallMatrix)
head(smallMatrix)
dim(smallMatrix)

Importing HDF5 files

Note that a large matrix from an HDF5 file can also be loaded using the HDF5Array function.

For example:

dataLocation <- system.file("extdata", "exMatrix.h5", package =
                              "MultiAssayExperiment", mustWork = TRUE)
h5ls(dataLocation)
hdf5Data <- HDF5ArraySeed(file = dataLocation, name = "exMatrix")
newDelayedMatrix <- DelayedArray(hdf5Data)
class(newDelayedMatrix)
head(newDelayedMatrix)

Dimnames from HDF5 file

Currently, the rhdf5 package does not store dimnames in the h5 file by default. A request for this feature has been sent to the maintainer of the rhdf5 package and any further development to the HDF5Array package is contingent on such lower level dimension name storage.

Using a DelayedMatrix with MultiAssayExperiment

A DelayedMatrix alone conforms to the MultiAssayExperiment API requirements. Shown below, the DelayedMatrix can be put into a named list and passed into the MultiAssayExperiment constructor function.

HDF5MAE <- MultiAssayExperiment(experiments = list(smallMatrix = smallMatrix))
sampleMap(HDF5MAE)
colData(HDF5MAE)

SummarizedExperiment with DelayedMatrix backend

A more information rich DelayedMatrix can be created when used in conjunction with the SummarizedExperiment class and it can even include rowRanges. The flexibility of the MultiAssayExperiment API supports classes with minimal requirements. Additionally, this SummarizedExperiment with the DelayedMatrix backend can be part of a bigger MultiAssayExperiment object. Below is a minimal example of how this would work:

HDF5SE <- SummarizedExperiment(assays = smallMatrix)
assay(HDF5SE)
MultiAssayExperiment(list(HDF5SE = HDF5SE))

Additional scenarios are currently in development where an HDF5Matrix is hosted remotely. Many opportunities exist when considering on-disk and off-disk representations of data with MultiAssayExperiment.



vjcitn/MultiAssayExperiment documentation built on May 3, 2019, 6:13 p.m.