Filematrix vs. bigmemory (packages)"

Motivation for creation of filematrix package

The filematrix package was originally conceived as an alternative to bigmemory package for three reasons.

First, matrices created with bigmemory on NFS (network file system) have often been corrupted (contained all zeros). This is most likely a fault of memory-mapped files on NFS.

Second, bigmemory was not available for Windows for a long periof of time. It is now fully cross platform.

Finally, bigmemory package uses memory mapped file interface to work with data files. This delivers great performance for matrices smaller than the amount of computer memory, but were experiencing major slowdown for larger matrices.

Differences between filematrix and bigmemory packages

The packages use different libraries to read from and write to their big files. The filematrix package uses readBin and writeBin R functions. The bigmemory package memory-mapped file access via BH R package interface (Boost C++).

Note that filematrix can store real values in short 4 byte format. This feature is not available in bigmemory.

Differences in tests

Due to different file access approach:

Consequently:

Example when filematrix is much more efficient than bigmemory

library(knitr)
# opts_knit$set(root.dir=tempdir())

Let us consider a simple task of filling in a large matrix (twice memory size). Below is the code using filematrix. It finishes in 10 minutes and does not interfere with other programs.

library(filematrix)
fm = fm.create(
        filenamebase = "big_fm",
        nrow = 1e5,
        ncol = 1e5)

tic = proc.time()
for( i in seq_len(ncol(fm)) ) {
    message(i, " of ", ncol(fm))
    fm[,i] = i + 1:nrow(fm)
}
toc = proc.time()
show(toc-tic)

# Cleanup

closeAndDeleteFiles(fm)

Filling the same sized big matrix with bigmemory can be very slow (2.5 times slower in this experiment). The bigmemory package uses memory mapped file technique to access the file. When the matrix is written to, the memory mapped file occupies all available RAM and the computer slows to a halt. Task Manager shows the memory mapped file occupy all 
available RAM when filling a large matrix with 
bigmemory package.

Please excercise caution when running the code below.

library(bigmemory)
fm = filebacked.big.matrix(
        nrow = 1e5,
        ncol = 1e5,  
        type = "double",
        backingfile = "big_bm.bmat",
        backingpath = "./",
        descriptorfile = "big_bm.desc.txt")

tic = proc.time()
for( i in seq_len(ncol(fm)) ) {
    message(i, " of ", ncol(fm))
    fm[,i] = i + 1:nrow(fm)
}
flush(fm)
toc = proc.time()
show(toc-tic)

# Cleanup

rm(fm)
gc()
unlink("big_bm.bmat")
unlink("big_bm.desc.txt")


Try the filematrix package in your browser

Any scripts or data that you put into this service are public.

filematrix documentation built on May 2, 2019, 7:23 a.m.