ffply: Read, process each block and write the result

View source: R/ffply.R

ffplyR Documentation

Read, process each block and write the result

Description

Suppose you want to process each block of a file and the result is again a data.table that you want to print to some output file. One possible approach is to use l <- flply(...) followed by do.call(rbind, l) and fwrite, but this would be slow. ffply offers a faster solution to this problem.

Usage

ffply(
  input,
  output = "",
  FUN,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)

Arguments

input

Path of the input file.

output

String containing the path to the output file.

FUN

Function to be applied to each block. It must take at least two arguments, the first of which is a data.table containing the current block, without the first field; the second argument is a character vector containing the value of the first field for the current block.

...

Additional arguments to be passed to FUN.

key.sep

The character that delimits the first field from the rest.

sep

The field delimiter (often equal to key.sep).

skip

Number of lines to skip at the beginning of the file

header

Whether the file has a header.

nblocks

The number of blocks to read.

stringsAsFactors

Whether to convert strings into factors.

colClasses

Vector or list specifying the class of each field.

select

The columns (names or numbers) to be read.

drop

The columns (names or numbers) not to be read.

col.names

Names of the columns.

parallel

Number of cores to use.

Value

Returns NULL invisibly. As a side effect, writes the processed data.table to the output file.

Slogan

ffply: from file to file

Examples

f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")
f2 <- tempfile()

# Copy the first two blocks from f1 into f2 to obtain a shorter but
# consistent version of the original input file.
ffply(f1, f2, function(d, by) {return(d)}, nblocks = 2)

# Compute the mean of the columns for each species
ffply(f1, f2, function(d, by) d[, lapply(.SD, mean)])

# Reshape the file, block by block
ffply(f1, f2, function(d, by) {
    val <- do.call(c, d)
    var <- rep(names(d), each = nrow(d))
    data.table(Var = var, Val = val)
})


fplyr documentation built on Aug. 24, 2023, 1:08 a.m.