ftply: Read, process each block and return a data.table

View source: R/ftply.R

ftplyR Documentation

Read, process each block and return a data.table

Description

ftply takes as input the path to a file and a function, and returns a data.table. It is a faster equivalent to using l <- flply(...) followed by do.call(rbind, l).

Usage

ftply(
  input,
  FUN = function(d, by) d,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)

Arguments

input

Path of the input file.

FUN

Function to be applied to each block. It must take at least two arguments, the first of which is a data.table containing the current block, without the first field; the second argument is a character vector containing the value of the first field for the current block.

...

Additional arguments to be passed to FUN.

key.sep

The character that delimits the first field from the rest.

sep

The field delimiter (often equal to key.sep).

skip

Number of lines to skip at the beginning of the file

header

Whether the file has a header.

nblocks

The number of blocks to read.

stringsAsFactors

Whether to convert strings into factors.

colClasses

Vector or list specifying the class of each field.

select

The columns (names or numbers) to be read.

drop

The columns (names or numbers) not to be read.

col.names

Names of the columns.

parallel

Number of cores to use.

Details

ftply is similar to ffply, but while the latter writes to disk the result of the processing after each block, the former keeps the result in memory until all the file has been processed, and then returns the complete data.table.

Value

Returns a data.table with the results of the processing.

Slogan

ftply: from file to data.table

Examples

f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")

# Compute the mean of the columns for each species
ftply(f1, function(d, by) d[, lapply(.SD, mean)])

# Read only the first two blocks
ftply(f1, nblocks = 2)


fplyr documentation built on Aug. 24, 2023, 1:08 a.m.