ftply: Read, process each block and return a data.table
In fplyr: Apply Functions to Blocks of Files

View source: R/ftply.R

ftply

R Documentation

Read, process each block and return a data.table

Description

ftply takes as input the path to a file and a function, and returns a data.table. It is a faster equivalent to using l <- flply(...) followed by do.call(rbind, l).

Usage

ftply(
  input,
  FUN = function(d, by) d,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)

Arguments

`input`	Path of the input file.
`FUN`	Function to be applied to each block. It must take at least two arguments, the first of which is a `data.table` containing the current block, without the first field; the second argument is a character vector containing the value of the first field for the current block.
`...`	Additional arguments to be passed to FUN.
`key.sep`	The character that delimits the first field from the rest.
`sep`	The field delimiter (often equal to `key.sep`).
`skip`	Number of lines to skip at the beginning of the file
`header`	Whether the file has a header.
`nblocks`	The number of blocks to read.
`stringsAsFactors`	Whether to convert strings into factors.
`colClasses`	Vector or list specifying the class of each field.
`select`	The columns (names or numbers) to be read.
`drop`	The columns (names or numbers) not to be read.
`col.names`	Names of the columns.
`parallel`	Number of cores to use.

Details

ftply is similar to ffply, but while the latter writes to disk the result of the processing after each block, the former keeps the result in memory until all the file has been processed, and then returns the complete data.table.

Value

Returns a data.table with the results of the processing.

Slogan

ftply: from file to data.table

Examples

f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")

# Compute the mean of the columns for each species
ftply(f1, function(d, by) d[, lapply(.SD, mean)])

# Read only the first two blocks
ftply(f1, nblocks = 2)

fplyr documentation built on Aug. 24, 2023, 1:08 a.m.