Best practices"

Working with large matrices

The filematrix package can be used for matrices of any size. The most convenient way of working with small and moderately sized matrices is to quickly save and load them via fm.create.from.matrix and fm.load respectively.

However, the main purpose of the filematrix package is to allow users to work with matrices many times larger than the amount of computer memory. Such matrices can only be accessed by parts.

Let us setup a sufficiently large matrix for code examples below:

library(knitr)
# opts_knit$set(root.dir=tempdir())
library(filematrix)
fm = fm.create(
        filenamebase = tempfile(),
        nrow = 5000,
        ncol = 10000,
        type = "double")

Accessing all matrix elements

The fastest way to read or write all elements of a filematrix is to work with columns sequentially, multiple columns at a time. It is much faster than accessing a filematrix by rows.

The three examples below illustrate how this can be done for such tasks as - filling filematrix with values, - calculating column sums, and - calculating row sums.

Filling in matrix values

Let us fill in the matrix with random values 512 columns at a time.

step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
    fr = (part-1)*step1 + 1
    to = min(part*step1, runto)
    message( "Filling in columns ", fr, " to ", to)
    fm[,fr:to] = runif(nrow(fm) * (to-fr+1))
}
rm(part, step1, runto, nsteps, fr, to)

Calculating column sums

Let us calculate column sums of the filematrix, 256 columns at a time.

fmcolsums = double(ncol(fm))

step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
    fr = (part-1)*step1 + 1
    to = min(part*step1, runto)
    message("Calculating column sums, processing columns ", fr, " to ", to)
    fmcolsums[fr:to] = colSums(fm[,fr:to])
}
rm(part, step1, runto, nsteps, fr, to)

message("Sums of first and last columns are ", 
        fmcolsums[1], " and ", tail(fmcolsums,1))

Calculating row sums

Let us calculate column sums of the filematrix, 256 columns at a time.

fmrowsums = double(nrow(fm))

step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
    fr = (part-1)*step1 + 1
    to = min(part*step1, runto)
    message("Calculating row sums, processing columns ", fr, " to ", to)
    fmrowsums = fmrowsums + rowSums(fm[,fr:to])
}
rm(part, step1, runto, nsteps, fr, to)

message("Sums of first and last rows are ", 
        fmrowsums[1], " and ", tail(fmrowsums,1))

Removing filematrix after use

closeAndDeleteFiles(fm)

Version information

sessionInfo()


Try the filematrix package in your browser

Any scripts or data that you put into this service are public.

filematrix documentation built on May 2, 2019, 7:23 a.m.