ffdf: ff class for data.frames

View source: R/ffdf.R

ffdfR Documentation

ff class for data.frames

Description

Function 'ffdf' creates ff data.frames stored on disk very similar to 'data.frame'

Usage

ffdf(...
, row.names = NULL
, ff_split = NULL
, ff_join = NULL
, ff_args = NULL
, update = TRUE
, BATCHSIZE = .Machine$integer.max
, BATCHBYTES = getOption("ffbatchbytes")
, VERBOSE = FALSE)

Arguments

...

ff vectors or matrices (optionally wrapped in I() that shall be bound together to an ffdf object

row.names

A character vector. Not recommended for large objects with many rows.

ff_split

A vector of character names or integer positions identifying input components to physically split into single ff_vectors. If vector elements have names, these are used as root name for the new ff files.

ff_join

A list of vectors with character names or integer positions identifying input components to physically join in the same ff matrix. If list elements have names, these are used to name the new ff files.

update

By default (TRUE) new ff files are updated with content of input ff objects. Setting to FALSE prevents this update.

ff_args

a list with further arguments passed to ff in case that new ff objects are created via 'ff_split' or 'ff_join'

BATCHSIZE

passed to update.ff

BATCHBYTES

passed to update.ff

VERBOSE

passed to update.ff

Details

By default, creating an 'ffdf' object will NOT create new ff files, instead existing files are referenced. This differs from data.frame, which always creates copies of the input objects, most notably in data.frame(matrix()), where an input matrix is converted to single columns. ffdf by contrast, will store an input matrix physically as the same matrix and virtually map it to columns. Physically copying a large ff matrix to single ff vectors can be expensive. More generally, ffdf objects have a physical and a virtual component, which allows very flexible dataframe designs: a physically stored matrix can be virtually mapped to single columns, a couple of physically stored vectors can be virtually mapped to a single matrix. The means to configure these are I for the virtual representation and the 'ff_split' and 'ff_join' arguments for the physical representation. An ff matrix wrapped into 'I()' will return the input matrix as a single object, using 'ff_split' will store this matrix as single vectors - and thus create new ff files. 'ff_join' will copy a couple of input vectors into a unified new ff matrix with dimorder=c(2,1), but virtually they will remain single columns. The returned ffdf object has also a dimorder attribute, which indicates whether the ffdf object contains a matrix with non-standard dimorder c(2,1), see dimorderStandard.
Currently, virtual windows are not supported for ffdf.

Value

A list with components

physical

the underlying ff vectors and matrices, to be accessed via physical

virtual

the virtual features of the ffdf including the virtual-to-physical mapping, to be accessed via virtual

row.names

the optional row.names, see argument row.names

and class 'ffdf' (NOTE that ffdf dows not inherit from ff)

Methods

The following methods and functions are available for ffdf objects:

Type Name Assign Comment
Basic functions
function ffdf constructor for ffdf objects
generic update updates one ffdf object with the content of another
generic clone clones an ffdf object
method print print ffdf
method str ffdf object structure
Class test and coercion
function is.ffdf check if inherits from ff
generic as.ffdf coerce to ff, if not yet
generic as.data.frame coerce to ram data.frame
Virtual storage mode
generic vmode get virtual modes for all (virtual) columns
Physical attributes
function physical get physical attributes
Virtual attributes
function virtual get virtual attributes
method length get length
method dim <- get dim and set nrow
generic dimorder get the dimorder (non-standard if any component is non-standard)
method names <- set and get names
method row.names <- set and get row.names
method dimnames <- set and get dimnames
method pattern <- set pattern (rename/move files)
Access functions
method [ <- set and get data.frame content ([,]) or get ffdf with less columns ([])
method [[ <- set and get single column ff object
method $ <- set and get single column ff object
Opening/Closing/Deleting
generic is.open tri-bool is.open status of the physical ff components
method open open all physical ff objects (is done automatically on access)
method close close all physical ff objects
method delete deletes all physical ff files
method finalize call finalizer
processing
method chunk create chunked index
method sortLevels sort and recode levels
Other

Note

Note that in theory, accessing a chunk of rows from a matrix with dimorder=c(2,1) should be faster than accessing across a bunch of vectors. However, at least under windows, the OS has difficulties filecaching parts from very large files, therefore - until we have partitioning - the recommended physical storage is in single vectors.

Author(s)

Jens Oehlschlägel

See Also

data.frame, ff, for more example see physical

Examples

 m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4")))
 v <- 1:3
 ffm <- as.ff(m)
 ffv <- as.ff(v)

 d <- data.frame(m, v)
 ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm))
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 d <- data.frame(m, v)
 ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1)
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 d <- data.frame(m, v)
 ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2)))
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 d <- data.frame(I(m), I(v))
 ffd <- ffdf(m=I(ffm), v=I(ffv), row.names=row.names(ffm))
 all.equal(d, ffd[,])
 ffd
 physical(ffd)

 rm(ffm,ffv,ffd); gc()

ff documentation built on Sept. 30, 2024, 9:38 a.m.

Related to ffdf in ff...