fmply: Read, process and write to multiple output files

View source: R/fmply.R

fmplyR Documentation

Read, process and write to multiple output files

Description

Sometimes a file should be processed in many different ways. fmply() applies a function to each block of the file; the function should return a list of m data.tables, each of which is written to a different output file. Optionally, the function can return a list of m + 1, where the first m elements are data.tables and are written to the output files, while the last element is returned as in flply().

Usage

fmply(
  input,
  outputs,
  FUN,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)

Arguments

input

Path of the input file.

outputs

Vector of m paths for the output files.

FUN

A function to apply to each block. Takes as input a data.table and optionally additional arguments. It should return a list of length m, the same length as the outputs vector. The first element of the list is written to the first output file, the second element of the list to the second output file, and so on. Besides these m data.tables, it can return an additional element, which is also returned by fmply().

...

Additional arguments to be passed to FUN.

key.sep

The character that delimits the first field from the rest.

sep

The field delimiter (often equal to key.sep).

skip

Number of lines to skip at the beginning of the file

header

Whether the file has a header.

nblocks

The number of blocks to read.

stringsAsFactors

Whether to convert strings into factors.

colClasses

Vector or list specifying the class of each field.

select

The columns (names or numbers) to be read.

drop

The columns (names or numbers) not to be read.

col.names

Names of the columns.

parallel

Number of cores to use.

Value

If FUN returns m elements, fmply() returns NULL invisibly. If FUN returns m + 1 elements, fmply() returns the list of all the last elements. As a side effect, it writes the first m outputs of FUN to the outputs files.

Slogan

fmply: from file to multiple files

Examples


fin <- system.file("extdata", "dt_iris.csv", package = "fplyr")
fout1 <- tempfile()
fout2 <- ""

# Copy the input file to tempfile as it is, and, at the same time, print
# a summary to the console
fmply(fin, c(fout1, fout2), function(d) {
    list(d, data.table(unclass(summary(d))))
})

fout3 <- tempfile()
fout4 <- tempfile()

# Use linear and polynomial regression and print the outputs to two files
fmply(fin, c(fout3, fout4), function(d) {
    lr.fit <- lm(Sepal.Length ~ ., data = d[, !"Species"])
    lr.summ <- data.table(Species = d$Species[1], t(coefficients(lr.fit)))
    pr.fit <- lm(Sepal.Length ~ poly(as.matrix(d[, 3:5]), degree = 3),
                 data = d[, !"Species"])
    pr.summ <- data.table(Species = d$Species[1], t(coefficients(pr.fit)))
    list(lr.summ, pr.summ)
})


fplyr documentation built on Aug. 24, 2023, 1:08 a.m.