filematrix
packageThe filematrix
package was originally conceived as an alternative
to bigmemory
package for three reasons.
First, matrices created with bigmemory
on NFS (network file system)
have often been corrupted (contained all zeros).
This is most likely a fault of memory-mapped files on NFS.
Second, bigmemory
was not available for Windows for a long periof of time.
It is now fully cross platform.
Finally, bigmemory
package uses memory mapped file interface to
work with data files. This delivers great performance for
matrices smaller than the amount of computer memory,
but were experiencing major slowdown for larger matrices.
filematrix
and bigmemory
packagesThe packages use different libraries to read from and write to their big files.
The filematrix
package uses readBin
and writeBin
R functions.
The bigmemory
package memory-mapped file access via
BH
R package interface (Boost C++).
Note that filematrix
can store real values in short
4 byte format.
This feature is not available in bigmemory
.
Due to different file access approach:
bigmemory
accumulates changes to the matrix in memory and
writes them to the file upon call of flush
or file closure.filematrix
writes the changes to the file upon the request without delay.Consequently:
bigmemory
works well for matrices smaller than the system memory.
Writing to larger matrices is much slower due to system trying to
keep as much of the matrix in the system memory (cache) as possible.filematrix
's performance does not deteriorate on matrices many times larger than the system memory.
bigmemory
is better for random access of small file matrices.
filematrix
is equally good or better for block
and column-wise access of the file matrices.filematrix
is much more efficient than bigmemory
library(knitr) # opts_knit$set(root.dir=tempdir())
Let us consider a simple task of filling in a large matrix (twice memory size).
Below is the code using filematrix
.
It finishes in 10 minutes and does not interfere with other programs.
library(filematrix) fm = fm.create( filenamebase = "big_fm", nrow = 1e5, ncol = 1e5) tic = proc.time() for( i in seq_len(ncol(fm)) ) { message(i, " of ", ncol(fm)) fm[,i] = i + 1:nrow(fm) } toc = proc.time() show(toc-tic) # Cleanup closeAndDeleteFiles(fm)
Filling the same sized big matrix with bigmemory
can be very slow (2.5 times slower in this experiment).
The bigmemory
package uses memory mapped file technique to access the file.
When the matrix is written to, the memory mapped file occupies
all available RAM and the computer slows to a halt.
Please excercise caution when running the code below.
library(bigmemory) fm = filebacked.big.matrix( nrow = 1e5, ncol = 1e5, type = "double", backingfile = "big_bm.bmat", backingpath = "./", descriptorfile = "big_bm.desc.txt") tic = proc.time() for( i in seq_len(ncol(fm)) ) { message(i, " of ", ncol(fm)) fm[,i] = i + 1:nrow(fm) } flush(fm) toc = proc.time() show(toc-tic) # Cleanup rm(fm) gc() unlink("big_bm.bmat") unlink("big_bm.desc.txt")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.