chunk.ffdf: Chunk ff_vector and ffdf
In ff: Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

chunk.ffdf

R Documentation

Chunk ff_vector and ffdf

Description

Chunking method for ff_vector and ffdf objects (row-wise) automatically considering RAM requirements from recordsize as calculated from sum(.rambytes[vmode])

Usage

## S3 method for class 'ff_vector'
chunk(x
, RECORDBYTES = .rambytes[vmode(x)], BATCHBYTES = getOption("ffbatchbytes"), ...)
## S3 method for class 'ffdf'
chunk(x
, RECORDBYTES = sum(.rambytes[vmode(x)]), BATCHBYTES = getOption("ffbatchbytes"), ...)

Arguments

`x`	`ff` or `ffdf`
`RECORDBYTES`	optional integer scalar representing the bytes needed to process an element of the `ff_vector` a single row of the `ffdf`
`BATCHBYTES`	integer scalar limiting the number of bytes to be processed in one chunk, default from `getOption("ffbatchbytes")`, see also `.rambytes`
`...`	further arguments passed to `chunk`

Value

A list with ri indexes each representing one chunk

Author(s)

Jens Oehlschlägel

Examples

  x <- data.frame(x=as.double(1:26), y=factor(letters), z=ordered(LETTERS), stringsAsFactors = TRUE)
  a <- as.ffdf(x)
  ceiling(26 / (300 %/% sum(.rambytes[vmode(a)])))
  chunk(a, BATCHBYTES=300)
  ceiling(13 / (100 %/% sum(.rambytes[vmode(a)])))
  chunk(a, from=1, to = 13, BATCHBYTES=100)
  rm(a); gc()

  message("dummy example for linear regression with biglm on ffdf")
  library(biglm)

  message("NOTE that . in formula requires calculating terms manually
    because . as a data-dependant term is not allowed in biglm")
  form <- Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species

  lmfit <- lm(form, data=iris)

  firis <- as.ffdf(iris)
  for (i in chunk(firis, by=50)){
    if (i[1]==1){
      message("first chunk is: ", i[[1]],":",i[[2]])
      biglmfit <- biglm(form, data=firis[i,,drop=FALSE])
    }else{
      message("next chunk is: ", i[[1]],":",i[[2]])
      biglmfit <- update(biglmfit, firis[i,,drop=FALSE])
    }
  }

  summary(lmfit)
  summary(biglmfit)
  stopifnot(all.equal(coef(lmfit), coef(biglmfit)))

ff documentation built on April 4, 2025, 6:05 a.m.