Introduction

We want to get some background on the use of HDF5 and DelayedArray infrastructure for modest-size assay sets.

We'll work with some 450k data.

suppressPackageStartupMessages({
library(SummarizedExperiment)
library(HDF5Array)
library(benchOOM)
})
bano_inmem = get_bano_inmem()
bano_inmem

Export a standard RangedSummarizedExperiment

Let's use the newly defined method for exporting the assay component of a SummarizedExperiment:

library(benchOOM)
tf = tempfile()
dir.create(tf)
banoSE = saveHDF5SummarizedExperiment(bano_inmem, tf, replace=TRUE)
banoSE

Access capabilities:

assay(banoSE[1,])
subsetByOverlaps(banoSE, rowRanges(banoSE)[1001:1010])

Import a packaged HDF5RangedSummarizedExperiment

litg = getLitGeu()
litg

Benchmarking exercise

We want to create a spectrum of biologically meaningful queries. To this end we introduced the gobrowse and go2ranges functions.

r1 = go2ranges("oxidoreductase activity", "v75") # length 2302
length(r1)
library(microbenchmark)
# wrap in some kind of harness structure -- outputs that move towards visualization
# shiny interface -- user can pick pathway size or sort or not ... statistics
# would need to be able to interrupt -- that does kill the whole session?
microbenchmark(as.array(assay(subsetByOverlaps(bano_inmem, r1[1:20]))), times=5)
microbenchmark(as.array(assay(subsetByOverlaps(banoSE, r1[1:20]))), times=5)
# also include an exported, sorted GenomicRanges ... metadata work may be needed

This shows that the speed of getting a delayed array corresponding to a large pathway is similar if the base data are on disk in HDF5 or in memory.

However, applying as.array to get the results into memory seems slow for this particular request, and this could have something to do with the order of the rows as specified in the request. They may be scattered throughout the HDF5 store, and costly to recover together, in the order returned by the in memory request. Needs more investigation.

There is an indication that if the records requested are contiguous rows, the as.array moves quickly...



vjcitn/benchOOM documentation built on April 19, 2021, 11:20 p.m.