make.readchunk: Fast and friendly chunk file finagler

Description Usage Arguments Details Value See Also Examples

View source: R/make.chunk.R

Description

Read a file chunk by chunk

Usage

1
make.readchunk(input, FUN = identity, chunksize = 5000L)

Arguments

input

a length 1 character string. See Details.

FUN

any function applicated to each chunk

chunksize

number of lines for each chunk

Details

It creates a function that reads sucesive chunks of the data referenced by input usings the fread function. The input is characterized in the help page of fread. The data contained in the input reference should not have any header.

This function is inspired by the bigglm example.

Value

A function with an logical argument, reset. If this argument is TRUE, it indicates that the data should be reread from the beginning by subsequent calls. When it reads all the data, it automatically resets the file. This function returns the value of FUN applied to the chunk. By default, the chunk is returned as a tbl_df object.

See Also

bigglm, fread, tbl_df

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
library(hflights)
nrow(hflights) # Number of rows

## We create a file with no header
input <- "hflights.csv"
write.table(hflights,file=input,sep=",",
            row.names=FALSE,col.names=FALSE)

## Get the number of rows of each chunk
readchunk <- make.readchunk(input,FUN=function(x){NROW(x)})

a <- NULL
while(!is.null(b <- readchunk())) {
  if(is.null(a)) {
    a <- b
  } else {
    a <- a+b
  }
}
all.equal(a, nrow(hflights))

## It resets automatically the file 
a <- NULL 
while(!is.null(b <- readchunk())) {
  if(is.null(a)) {
    a <- b
  } else {
    a <- a+b
  }
}
all.equal(a, nrow(hflights))

## End(Not run)

freqweights documentation built on May 29, 2017, 12:01 p.m.