chunk.ffdf | R Documentation |
Chunking method for ff_vector and ffdf objects (row-wise) automatically considering RAM requirements from recordsize as calculated from sum(.rambytes[vmode])
## S3 method for class 'ff_vector'
chunk(x
, RECORDBYTES = .rambytes[vmode(x)], BATCHBYTES = getOption("ffbatchbytes"), ...)
## S3 method for class 'ffdf'
chunk(x
, RECORDBYTES = sum(.rambytes[vmode(x)]), BATCHBYTES = getOption("ffbatchbytes"), ...)
x |
|
RECORDBYTES |
optional integer scalar representing the bytes needed to process an element of the |
BATCHBYTES |
integer scalar limiting the number of bytes to be processed in one chunk, default from |
... |
further arguments passed to |
A list with ri
indexes each representing one chunk
Jens Oehlschlägel
chunk
, ffdf
x <- data.frame(x=as.double(1:26), y=factor(letters), z=ordered(LETTERS), stringsAsFactors = TRUE)
a <- as.ffdf(x)
ceiling(26 / (300 %/% sum(.rambytes[vmode(a)])))
chunk(a, BATCHBYTES=300)
ceiling(13 / (100 %/% sum(.rambytes[vmode(a)])))
chunk(a, from=1, to = 13, BATCHBYTES=100)
rm(a); gc()
message("dummy example for linear regression with biglm on ffdf")
library(biglm)
message("NOTE that . in formula requires calculating terms manually
because . as a data-dependant term is not allowed in biglm")
form <- Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species
lmfit <- lm(form, data=iris)
firis <- as.ffdf(iris)
for (i in chunk(firis, by=50)){
if (i[1]==1){
message("first chunk is: ", i[[1]],":",i[[2]])
biglmfit <- biglm(form, data=firis[i,,drop=FALSE])
}else{
message("next chunk is: ", i[[1]],":",i[[2]])
biglmfit <- update(biglmfit, firis[i,,drop=FALSE])
}
}
summary(lmfit)
summary(biglmfit)
stopifnot(all.equal(coef(lmfit), coef(biglmfit)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.