flply: Read, process each block and return a list

View source: R/flply.R

flplyR Documentation

Read, process each block and return a list

Description

With flply() you can apply a function to each block of the file separately. The result of each function is saved into a list and returned. flply() is similar to lapply(), except that it applies the function to each block of the file rather than to each element of a list. It is also similar to by(), except that it does not read the whole file into memory, but each block is processed as soon as it is read from the disk.

Usage

flply(
  input,
  FUN,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)

Arguments

input

Path of the input file.

FUN

A function to be applied to each block. The first argument to the function must be a data.table containing the current block. Additional arguments can be passed with ....

...

Additional arguments to be passed to FUN.

key.sep

The character that delimits the first field from the rest.

sep

The field delimiter (often equal to key.sep).

skip

Number of lines to skip at the beginning of the file

header

Whether the file has a header.

nblocks

The number of blocks to read.

stringsAsFactors

Whether to convert strings into factors.

colClasses

Vector or list specifying the class of each field.

select

The columns (names or numbers) to be read.

drop

The columns (names or numbers) not to be read.

col.names

Names of the columns.

parallel

Number of cores to use.

Value

Returns a list containing, for each chunk, the result of the processing.

Slogan

flply: from file to list

Examples

f <- system.file("extdata", "dt_iris.csv", package = "fplyr")

# Compute, within each block, the correlation between Sepal.Length and Petal.Length
flply(f, function(d) cor(d$Sepal.Length, d$Petal.Length))

# Summarise each block
flply(f, summary)

# Make a different linear model for each block
block.lm <- function(d) {
  lm(Sepal.Length ~ ., data = d[, !"Species"])
}
lm.list <- flply(f, block.lm)


fplyr documentation built on Aug. 24, 2023, 1:08 a.m.