fileplyr-package: fileplyr



A package to perform chunk processing or split-apply-combine on data in a delimited file(example: CSV) across multiple cores of a single machine with low memory footprint. These functions are a convenient wrapper over the versatile package datadr.

Memory vs time tradeoff: Handling data that cannot be loaded into RAM involves a tradeoff with memory usage versus time required to complete the computation task. fileplyr lies at one end of the spectrum where it is assumed that the user has time at disposal as opposed to hardware facility. The size of the file to be processed has to be large enough to observe the advantage of the package.

Audience: Analysts with laptop/desktop with multiple cores and limited RAM(and lots of time) might find the package helpful for prototyping tasks. A proper estimation of memory usage will let other applications to function comfortably along with a R process running a fileplyr job.

fileplyr documentation built on May 30, 2017, 8:18 a.m.