LimWarn: ff Limitations and Warnings

LimWarnR Documentation

ff Limitations and Warnings

Description

This help page lists the currently known limitations of package ff, as well as differences between ff and ram methods.

Automatic file removal

Remind that not giving parameter ff(filename=) will result in a temporary file in fftempdir with 'delete' finalizer, while giving parameter ff(filename=) will result in a permanent file with 'close' finalizer. Do avoid setting setwd(getOption("fftempdir"))! Make sure you really understand the implications of automatic unlinking of getOption("fftempdir") .onUnload, of finalizer choice and of finalizing behaviour at the end of R sessions as defaulted in getOption("fffinonexit"). Otherwise you might experience 'unexpected' losses of files and data.

Size of objects

ff objects can have length zero and are limited to .Machine$integer.max elements. We have not yet ported the R code to support 64bit double indices (in essence 52 bits integer) although the C++ back-end has been prepared for this. Furthermore filesize limitations of the OS apply, see ff.

Side effects

In contrast to standard R expressions, ff expressions violate the functional programming logic and are called for their side effects. This is also true for ram compatibility functions swap.default, and add.default.

Hybrid copying semantics

If you modify a copy of an ff object, changes of data ([<-) and of physical attributes will be shared, but changes in virtual and class attributes will not.

Limits of compatibility between ff and ram objects

If it's not too big, you can move an ff object completely into R's RAM through as.ram. However, you should watch out for three limitations:

  1. Ram objects don't have hybrid copying semantics; changes to a copy of a ram object will never change the original ram object

  2. Assigning values to a ram object can easily upgrade to a higher storage.mode. This will create conflicts with the vmode of the ram object, which goes undetected until you try to write back to disk through as.ff.

  3. Writing back to disk with as.ff under the same filename requires that the original ff object has been deleted (or at least closed if you specify parameter overwrite=TRUE).

Index expressions

ff index expressions do not allow zeros and NAs, see see [.ff and see as.hi

Availablility of bydim parameter

Parameter bydim is only available in ff access methods, see [.ff

Availablility of add parameter

Parameter add is only available in ff access methods, see [.ff

Compatibility of swap and add

If index expressions contain duplicated positions, the ff and ram methods for swap and add will behave differently, see swap.

Definition of [[ and [[<-

You should consider the behaviour of [[.ff and [[<-.ff as undefined and not use them in programming. Currently they are shortcuts to get.ff and set.ff, which unlike [.ff and [<-.ff do not support factor and POSIXct, nor dimorder or virtual windows vw. In contrast to the standard methods, [[.ff and [[<-.ff only accepts positive integer index positions. The definition of [[.ff and [[<-.ff may be changed in the future.

Multiple vector interpretation in arrays

R objects have always standard dimorder seq_along(dim). In case of non-standard dimorder (see dimorderStandard) the vector sequence of array elements in R and in the ff file differs. To access array elements in file order, you can use getset.ff, readwrite.ff or copy the ff object and set dim(ff)<-NULL to get a vector view into the ff object (using [ dispatches the vector method [.ff). To access the array elements in R standard dimorder you simply use [ which dispatches to [.ff_array. Note that in this case as.hi will unpack the complete index, see next section.

RAM expansion of index expressions

Some index expressions do not consume RAM due to the hi representation. For example 1:n will almost consume no RAM however large n. However, some index expressions are expanded and require to maxindex(i) * .rambytes["integer"] bytes, either because the sorted sequence of index positions cannot be rle-packed efficiently or because hiparse cannot yet parse such expression and falls back to evaluating/expanding the index expression. If the index positions are not sorted, the index will be expanded and a second vector is needed to store the information for re-ordering, thus the index requires 2 * maxindex(i) * .rambytes["integer"] bytes.

RAM expansion when recycling assigment values

Some assignment expressions do not consume RAM for recycling. For example x[1:n] <- 1:k will not consume RAM however large is n compared to k, when x has standard dimorder. However, if length(value)>1, assignment expressions with non-ascending index positions trigger recycling the value R-side to the full index length. This will happen if dimorder does not match parameter bydim or if the index is not sorted in ascending order.

Byteorder imcompatibility

Note that ff files cannot been transferred between systems with different byteorder.


ff documentation built on Sept. 30, 2024, 9:38 a.m.

Related to LimWarn in ff...